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Preface 


This book presents a rigorous, comprehensive, modern, and detailed account 
of the mathematical methods and tools required for the semantic analysis of 
logic programs. It is, in part, the outcome of a fruitful research collabora- 
tion between the authors over the last decade or so and contains many of 
the results we obtained during that period. In addition, it discusses the work 
of many other authors and places it within the overall context of the sub- 
ject matter. A major feature of the book is that it significantly extends the 
tools and methods from the order theory traditionally used in the subject 
to include non-traditional methods from mathematical analysis depending on 
topology, generalized distance functions, and their associated fixed-point the- 
ory. The need for such methods arises for several reasons. One reason is the 
non-monotonicity of some important semantic operators, associated with logic 
programs, when negation is included in the syntax of the underlying language, 
and another arises in the context of neural-symbolic integration, as discussed 
briefly in the next paragraph and in more detail in the Introduction. Fur- 
thermore, it is our belief that certain of our results, although here focused on 
logic programming, have much wider applicability and should prove useful in 
other parts of theoretical computer science not immediately related to logic 
programming. However, we do not discuss this issue in the book in detail and 
instead we give references to the literature at appropriate places in the text 
in order to aid readers interested in investigating this point more thoroughly. 


All the well-known, important semantics in logic programming are devel- 
oped in the book from a unified point of view using both order theory and the 
non-traditional methods just alluded to, and this provides an illustration of 
the main objectives of the book. In addition, the interrelationships between 
the various semantics are closely examined. Moreover, a significant amount of 
space is devoted to examining the integration of logic programming and con- 
nectionist systems (or neural networks) from the point of view of semantics. 
Indeed, in the wide sense of integrating discrete models of computation with 
continuous models, one can expect to employ a mix of mathematical tools of 
both a discrete and continuous nature, as illustrated by the particular choice 
of models we make here. Therefore, there is a need in the study of the seman- 
tics of logic programming (and in the study of general models of computation) 
for a self-contained and detailed exposition of the development of both con- 
ventional and non-conventional methods and techniques, as just explained, 
and their interaction. This book sets out to provide such an exposition, at 
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least in part, and is, we believe, unique in its content and coverage and fills a 
significant gap in the literature on theoretical computer science. 

The book is mainly aimed at advanced undergraduate students, gradu- 
ate students, and researchers interested in the interface between mathemat- 
ics and computer science. It presents material from the early days of logic 
programming through to topics which are of current importance. It should 
be of special interest to those engaged in the foundations of logic program- 
ming, theoretical aspects of knowledge representation and reasoning, artifi- 
cial intelligence, the integration of logic-based systems with other models of 
computation, logic in computer science, semantics of computation, and re- 
lated topics. The book should also prove to be of interest to those engaged 
in domain theory and in applications of general topology to computer sci- 
ence. Indeed, it carries out for logic programming semantics, in a general 
model-building sense, something akin to what the well-known treatments of 
Abramsky and Jung [Abramsky and Jung, 1994] and Stoltenberg-Hansen et 
al. [Stoltenberg-Hansen et al., 1994] set out to do for the semantics of conven- 
tional programming languages. 

We have inevitably built up a considerable debt of gratitude to a num- 
ber of colleagues, collaborators, post-doctoral researchers, and post-graduate 
students during the course of conducting the research presented here. It is 
therefore a pleasure to record our thanks for insights, comments, and valu- 
able discussions to all of them. They include Sebastian Bader, Federico Banti, 
Howard Blair, Eleanor Clifford, Artur S. d’Avila Garcez, Ben Goertzel, Bar- 
bara Hammer, Roland Heinze, Steffen Ho6lldobler, Achim Jung, Matthias 
Knorr, Ekaterina Komendantskaya, Vladimir Komendantsky, Ralph Kop- 
perman, Markus Kroétzsch, Kai-Uwe Ktihnberger, Luis Lamb, Maire Lane, 
Jens Lehmann, Tobias Matzner, Turlough Neary, John Power, Sibylla Prief- 
Crampe, Paulo Ribenboim, Bill Rounds, Sibylle Schwarz, Paweł Waszkiewicz, 
Matthias Wendt, Andreas Witzel, Damien Woods, and Guo-Qiang Zhang. In 
particular, we are grateful to Sebastian Bader for his contribution to Chap- 
ter 7, and indeed this chapter was written jointly with him. 

Our acknowledgments and thanks are also due to a number of institutions 
and individuals for hosting us on a number of research projects and visits and 
to various funding agencies for making the latter possible. 

In particular, Pascal Hitzler acknowledges the support of Science Foun- 
dation Ireland; the Boole Centre for Research at University College Cork; 
University College Cork itself; the Deutscher Akademischer Austauschdienst 
(DAAD); and Case Western Reserve University, Cleveland, Ohio. While con- 
ducting the research which led to the contents of this book, P. Hitzler changed 
affiliation several times, and he is grateful to University College Cork Ire- 
land; the International Center on Computational Logic at Technical Univer- 
sity Dresden; Case Western Reserve University, Cleveland, Ohio; the Institute 
for Applied Informatics and Formal Description Methods (AIFB) at the Uni- 
versity of Karlsruhe; and the Kno.e.sis Center at Wright State University, 
Dayton, Ohio for providing excellent working environments. 
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Introduction 


Logic programming is programming with logic. In essence, the idea is to use 
formal logic as a knowledge representation language with which to specify a 
problem and to view computation as the (automated) deduction of new knowl- 
edge from that given. The foundations of logic programming are usually based 
upon the seminal paper of Robert Kowalski [Kowalski, 1974], which built on 
John Alan Robinson’s well-known paper [Robinson, 1965] wherein foundations 
were laid for the field of automated deduction using the resolution principle. 
These ideas gave rise, more or less simultaneously, to the programming lan- 
guage Prolog, first realized by Alain Colmerauer et al. in Marseilles in 1973, 
see [Colmerauer and Roussel, 1993]. In this computing paradigm, a knowledge 
base is given in the form of a logic program, which may be thought of as a 
conjunctive normal form of a formula in the first-order language £ underlying 
the program as defined formally in Chapters 1 and 2. Then the program, or 
system, can be queried with conjunctions Q of partially instantiated atomic 
formulas, that is, with conjunctions of atomic formulas containing variables. 
The resulting answers produced by the system are substitutions 0 for these 
variables by terms in £ such that Q0 is a logical consequence of the knowledge 
base. The automated deduction performed by the system is usually based on 
a restricted form of resolution called SLD(NF)-resolution, see [Apt, 1997]. 

Since this early work, logic programming has become a major program- 
ming paradigm and has developed in a considerable number of different and 
diverse directions, including automated deduction (in the context, for exam- 
ple, of model checking), natural language processing, databases, knowledge 
representation and reasoning (including applications to the Semantic Web), 
cognitive robotics, and machine learning, to mention a few. Furthermore, the 
industrial applications using the underlying technologies, Prolog in the main, 
but also an increasing number of related systems, are growing steadily more 
numerous and more and more varied.? 


lFor some examples, the proceedings of the annual International Conference on Logic 
Programming (ICLP) provide a current view of the subject. The book [Bramer, 2010] con- 
tains an introduction to Prolog programming. A standard reference for the theory under- 
lying Prolog programming is [Apt, 1997]. The reference [Apt and Wallace, 2007] contains 
much about constraint logic programming. See [De Raedt et al., 2008] for details of cur- 
rent work in (probabilistic) inductive logic programming. For information about disjunctive 
logic programming systems, see [Leone et al., 2006] and the website for the DLV project at 
http://www.dbai.tuwien.ac.at /proj/dlv/, and for information concerning the related system 
smodels, see [Simons et al., 2002] and the website http: //www.tcs.hut.fi/Software/smodels/. 
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XX Introduction 


This book is concerned with the theory of logic programming languages 
or, in other words, with their syntax and their semantics, especially the latter. 
Very briefly, syntax in this context deals with formal grammar and automated 
deduction, as discussed earlier; semantics, as usual, is occupied with meaning. 
We will discuss semantics in more detail next. However, it should be observed 
straightaway that the semantics of logic programming languages is compli- 
cated in a way which is peculiar to them by the introduction of negation into 
their syntax. The manner in which one handles negation is important, and 
it is worth remarking that its development in logic programming has been 
much influenced by the development of negation in non-monotonic reasoning, 
a subject familiar in the field of artificial intelligence. Therefore, it will be 
helpful to say a little about negation in these terms before describing in de- 
tail the precise objectives of the book and its contents. This is because our 
treatment of negation and semantics, see Chapter 2, is partly guided by these 
considerations and also because negation and semantics are central themes of 
the book. 

Non-monotonic reasoning came into existence as a result of the desire to 
capture certain aspects of human commonsense reasoning based on the obser- 
vation that, in many situations occurring in everyday life, humans can reach 
conclusions under incomplete or uncertain knowledge. More formally, it is typ- 
ically the case that more facts can be derived from given facts or knowledge 
when using commonsense reasoning than is the case when first-order logic is 
employed. This has the consequence that some conclusions already made may 
have to be withdrawn when more facts become known. By contrast, classical 
logics such as propositional or predicate logic are monotonic in that whenever 
a formula F is entailed by a theory or set of formulas I, then T U {G} still 
entails F, for any formula G. 

The non-monotonic aspect of commonsense reasoning, however, has turned 
out to be rather difficult to formalize in a satisfactory way. Early work in 
this area was mainly based on three entirely different approaches”: John 
McCarthy’s circumscription, see [McCarthy, 1977, McCarthy, 1980]; Robert 
Moore’s autoepistemic logic, see [Moore, 1984, Moore, 1985]; and Ray Re- 
iter’s default logic, see [Reiter, 1980]. In fact, Prolog naturally includes some 
features which can be viewed as being non-monotonic: if the system can prove 
that a certain fact A does not follow from a given knowledge base, or program, 
then A is considered to be false and hence —A is considered to be true. How- 
ever, by adding the fact A to the program, we can now prove A, and thus we 
have to retract the earlier conclusion ~A. (Note that the negation occurring 
in ~A should not necessarily be taken here to be the negation encountered, 
say, in first-order logic, but rather it symbolizes negation as (finite) failure to 
prove A, as introduced in [Clark, 1978].) 


?See [Gabbay et al., 1994] for an excellent account of some of the main approaches to 
non-monotonic reasoning including discussions of their advantages and drawbacks, and of 
the validity of the intuitions underlying non-monotonic reasoning. Introductory textbooks 
are [Antoniou, 1996, Berzati, 2007, Makinson, 2005]. 
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For reasons of this sort, research into non-monotonic reasoning has influ- 
enced research into logic programming, and vice-versa, giving rise to impor- 
tant and fruitful ideas and research directions in both areas. In particular, 
such cross fertilization has led to the realization that logic programs, possi- 
bly augmented with some additional syntactic features, provide an excellent 
language for knowledge representation in the presence of non-monotonicity. 
In addition, such research has led to a number of implementations of non- 
monotonic-reasoning-based logic programming systems commonly known as 
answer set programming systems.’ 

Thus, the interaction between logic programming and non-monotonic rea- 
soning is important. It is not, however, the main focus of our work. On the 
contrary, our main focal points are, in a nutshell, first, the detailed develop- 
ment of the mathematical tools and methods required to study the semantics 
of logic programs, and second, in order to illustrate these methods, the detailed 
development of the main semantics of logic programs per se. In addition, we 
give an application of the methods we present to study semantics in the con- 
text of neural-symbolic integration, as described in more detail shortly. Thus, 
we do not treat procedural matters and matters concerned with implementa- 
tion in any depth, and indeed these issues are only touched on incidentally. We 
also do not discuss matters primarily concerned with non-monotonic reason- 
ing other than in the context of their role in guiding our thinking in relation 
to negation in logic programs, as already noted. It will therefore be of value 
to say a little more about our precise objectives, and we do this next. 

In common with most programming languages, the syntax of logic pro- 
gramming is comparatively easy to specify formally, whereas the semantics is 
much harder to deal with. Again, in common with other programming lan- 
guages, there are several ways of giving logic programs a formal semantics. 
First, logic programs have, of course, a procedural or operational semantics, 
which describes and is described by their behaviour when executed on some 
(abstract) machine. Second, unlike imperative or functional programs, logic 
programs have a natural semantics, called their declarative semantics, which 
arises simply because a logic program is a consistent set of well-formed formu- 
lae and can be viewed as a theory. This semantics is usually captured by means 
of models, in the sense of mathematical logic, and will play a dominant role 
in our development. Indeed, a central problem in the theory is the question of 
selecting the “right” model for a program, namely, a model which reflects the 
intended meaning of the programmer and relates it to what the program can 
compute. It is here that ideas from non-monotonic reasoning play a funda- 
mental role in determining the right models, including well-known ones such 
as the supported, stable, and well-founded models. Third, a standard and very 
important way of selecting the appropriate models for a logic program is to as- 


3For a discussion of these matters, see (Lifschitz, 1999, Marek and Truszczyniski, 1999, 
Baral, 2003]. For current developments in non-monotonic reasoning (versus logic program- 
ming), one may consult the proceedings series of the International Conferences on Logic 
Programming and Non-Monotonic Reasoning (LPNMR), for example. 
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sociate with the program one or more of a number of operators called semantic 
operators* defined on spaces of interpretations (or valuations) determined by 
the program. One then studies the fixed points of these operators, leading to 
the fixed-point semantics of the program in question. This latter semantics 
can roughly be equated with the denotational semantics of imperative and 
functional programs associated with the names of Dana Scott and Christo- 
pher Strachey because some, but not all, of the important semantic operators 
which have been introduced are Scott continuous in the sense of domain the- 
ory, or at least are monotonic. Moreover, fixed points play a fundamental role 
also in denotational semantics. Finally, there is a general requirement that 
all the semantics described previously should coincide or at least be closely 
related in some sense.” 

Taking the observations just made a little further forward, we note that 
there are several interconnected strands to the programme of analyzing the 
fixed points of semantic operators, but three of the main ones are as follows. 
First, we consider a number of operators already well-known in the theory, in 
addition to introducing several more. In this step, we focus on ensuring that 
the operators we study, and their fixed points, correctly reflect the meaning 
of programs and their properties. Second, we investigate the properties of the 
operators themselves, especially in relation to whether or not they are Scott 
continuous and, if not, what properties they do possess. Scott continuity is a 
desirable feature for a semantic operator to have because it implies that the 
operator has a least fixed point. Furthermore, this least fixed point is often 
taken to be the fixed-point semantics of the program in question, and indeed, 
operators which are not Scott continuous may in general fail to have any fixed 
points at all. Third, we study the fixed-point theory of semantic operators in 
considerable generality. In fact, the failure of certain apparently reasonable 
semantic operators (already known to capture declarative semantics) to be 
Scott continuous often results from the introduction of negation, because the 
introduction of negation may render the operators in question to be non- 
monotonic and hence to fail to be Scott continuous, as we will see in Chapter 2. 

The point just made is important because it is one of the reasons for 
introducing alternatives to order theory in studying fixed-point theory in re- 
lation to semantics and in establishing fixed-point theorems applicable to non- 
monotonic operators, see Chapter 4. Therefore, it will help to give some in- 
sight next into the non-traditional methods we introduce and develop, how 
they work in the context of negation, and especially how they work in find- 
ing models for logic programs with negation. Our point of view is to regard 
programs, and logic programs in particular, as (abstract) dynamical systems 
whose states change under program execution and whose state changes can be 
modelled by an operator T. Starting with some initial state, so, say, it is inter- 


4This is a generic term which we use to cover all of a number of specific operators we 
will study, such as the T’p-operator, see Definition 2.2.1. 

5See Theorem 2.2.3, for example, and [Lloyd, 1987] for details of how procedural seman- 
tics relates to declarative semantics. 
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esting to observe the behaviour of the sequence of iterates so, T(so), T? (s0), 
T?(so), ... of T on the state so. Suppose, for example, that so is the nowhere 
defined partial function on the natural numbers, and T is the operator on 
the partial functions determined in the usual way by some well-defined recur- 
sive definition on the natural numbers, see [Stoltenberg-Hansen et al., 1994], 
for example. Then, typically, the sequence of iterates will form an w-chain 
as defined in Chapter 1 and will converge in the Scott topology (defined in 
Chapter 3) to the supremum s of the chain; thus, we have s = lim T” (sọ) in 
the Scott topology on the partial functions. Furthermore, T will typically be 
Scott continuous (see Chapter 3 again for the definition of Scott continuity) in 
the sense that T(s) = T(lims,) = lim T (s) whenever sn is a sequence con- 
verging to s in the Scott topology, that is, a sequence satisfying lim s, = s. If 
T is indeed Scott continuous, then it is now easy to deduce that T(s) = s so 
that s is a fixed point, in fact, the least fixed point, of T. (These observations 
are the heart of the proof of Kleene’s theorem, Theorem 1.1.9, which is some- 
times viewed as the fundamental theorem in semantics. They are also quite 
close in form to the proof of the Banach contraction mapping theorem, The- 
orem 4.2.3, except that it is order rather than a contraction property which 
determines the convergence.) In such a situation, s is usually taken to be the 
meaning or semantics of the original recursive definition. Precisely the same 
sort of thing happens in relation to logic programming semantics in the case of 
logic programs P which do not contain negation or in other words are definite 
programs. Specifically, the iterates of the single-step operator (or immediate 
consequence operator)? T applied to the empty interpretation converge in the 
Scott topology to an interpretation M. This interpretation M is the (least) 
fixed point of T, captures well the declarative semantics for P, and relates well 
to the procedural semantics for P under SLD-resolution, see Theorem 2.2.3 
and the discussion following it. 


Following on from the comments just made in the previous paragraph is 
the interesting observation from our point of view, or the mathematical point 
of view, that the discussion just presented can quite easily be generalized: all 
that one needs is an abstract notion of convergence and an abstract notion 
of continuity. Such a setting is provided by the notion of convergence space, 
and in particular by convergence classes or equivalently by topological spaces, 
as defined in Chapter 3. These notions provide a general setting in which one 
can study semantics and in particular logic programming semantics for logic 
programs P which may or may not contain negation. The classical case of 
definite programs corresponds to taking the Scott topology, but we consider 
quite extensively another topology, called the Cantor topology by us, defined 
in Chapter 3, which is closely connected to negation, has connections with the 
Scott topology, and underlies important classes of programs which do involve 
negation such as acceptable programs and their generalizations, see Chap- 
ter 5. Indeed, a quite elementary property we use is given in Proposition 3.3.2 


6See Chapter 2 for the definitions of these terms. 
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and in a rather general form by Theorem 5.4.2 and simply states that if P 
is any logic program, and IJ is an interpretation such that T”(I) converges 
in the Cantor topology to an interpretation M, then M is a model for P; if, 
further, T is continuous in the Cantor topology, then M is a fixed point of T 
(here, again, T denotes the single-step operator associated with P). Further 
comments on this result are to be found in Remark 3.3.3 and in the comments 
immediately following Remark 3.3.3. In particular, this fact is exploited on a 
number of occasions to find models in the presence of negation and in partic- 
ular in studying acceptable programs, as just mentioned, and also in studying 
the perfect model for locally stratified programs in Chapter 6. Indeed, the 
working out of this observation together with some of its implications occu- 
pies a significant proportion of our time. In addition, because convergence is a 
key notion, our development of topology in Chapter 3 is based on it, although 
the main conclusions presented there are also given in other equivalent and 
familiar forms. 


In practice, detecting whether sequences converge or whether operators 
have fixed points is most easily done by means of metrics and more general 
distance functions (generalized metrics) together with their associated fixed- 
point theorems, the latter perhaps being reminiscent of the Banach contrac- 
tion mapping theorem, see Theorem 4.2.3. Furthermore, underlying the use 
of generalized metrics are topologies defined on spaces of interpretations, and 
we study these in Chapter 3 with a view to developing, in conjunction with 
Chapter 4, the mathematical analysis we apply later in Chapters 5 and 6 in 
studying acceptable programs and related semantics, as already mentioned, 
and in Chapter 7 in the context of artificial neural networks in relation to logic 
programming. This latter work concerns the problem of integrating different 
models of computation in an attempt to combine the best of each in a single 
system and understanding the semantics of the combined system. In our case, 
we consider the integration of logic programming, perhaps taken as repre- 
sentative of discrete systems, with connectionist systems, or neural networks, 
considered as continuous systems inspired by biological models of computa- 
tion. A means of doing this is to compute semantic operators by means of 
neural networks. However, in the case of first-order (non-propositional) pro- 
grams, it is necessary to employ approximation techniques (rather than exact 
computation) which depend on viewing spaces of interpretations as compact 
Hausdorff spaces, that is, to employ yet again methods from mathematical 
analysis. Such applications as these are another important reason for devel- 
oping a quite extensive body of mathematics which provides alternative tools 
to those based on order theory in studying semantics. In fact, one of the main 
highlights, themes and motivating features of this book is the analysis we carry 
out of foundational structures of various sorts, with an eye to potential ap- 
plications in the field of computational logic in general, as exemplified by our 
results in, for example, Chapter 7. Indeed, it seems probable that such meth- 
ods and tools will prove useful in developing foundations in other areas where 
discrete and continuous models of computation are combined, quite apart from 
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neural-symbolic integration. Such non-classical models of computation are of 
great interest generally in present times and may contain both continuous 
and discrete components, especially those inspired by physical phenomena. 
As such, their study will almost certainly require techniques appropriate to 
both their continuous elements and to their discrete elements and may well 
be of the sort developed here. 


It should be noted that other authors have, to a greater or lesser extent, 
employed mathematical analysis in the context of logic programming seman- 
tics. Their work is complementary to what we present here, and we briefly 
discuss some of it and its relationship with ours next and in more detail in the 
body of the text. For example, some of the recent work of Howard Blair and 
several of his colleagues on logic programming semantics is much concerned 
with the interaction between the continuous and the discrete, and it makes 
use of ideas from dynamical systems, convergence spaces, and automata the- 
ory to model hybrid systems. We consider this work further in Chapter 3. 
We mention also the work of Sibylla Prie8-Crampe and Paulo Ribenboim on 
the role of generalized ultrametrics in fixed-point theory in the context of 
logic programmimg semantics. They discuss both single-valued and multival- 
ued mappings in this context, and we consider their results in considerable 
detail in Chapter 4 and some of their applications in Chapter 5. In addition, 
we also include in Chapter 4 a discussion of recent work of Umberto Straccia, 
Manuel Ojeda-Aciego, and Carlos Damasio on multivalued mappings in the 
context of semantics and the relationship between their work and ours. Fi- 
nally, we discuss in Chapter 4 also the extensive work of William Rounds and 
Guo-Qiang Zhang on the use of domain theory as a theoretical foundation 
for logic programming, both from the point of view of procedural aspects and 
from the point of view of semantics. 


Summarizing the chapters, Chapter 1 contains, in fairly condensed form, 
the preliminaries from order theory, domain theory, and logic which we will 
employ throughout this book. In addition, we present two well-known fixed- 
point theorems, based on order, which are fundamental in applications to 
semantics. The next chapter, Chapter 2, introduces logic programs and the 
most important ways of assigning semantic operators and declarative seman- 
tics to them. The manner in which the material is presented is rather novel and 
employs the syntactic notion of level mapping, defined in Chapter 2. Indeed, 
we make several different applications of level mappings in our discussions, 
and they play a unifying role in several places in the course of developing our 
main themes. For example, their use in Chapter 2 provides a uniform and 
comprehensive treatment of all of the important different semantics known in 
the subject, including those associated with the supported, stable, and well- 
founded models mentioned earlier.” Sets of interpretations are important in 
that they are, among other things, the carrier sets for the various semantic 


TThe uniform characterizations by means of level mappings which will be given in Chap- 
ter 2 are due mainly to [Hitzler and Wendt, 2002]. 
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operators we discuss. Such sets themselves may be endowed with various, use- 
ful structures. In Chapter 3, we illustrate the point just made by studying 
various topologies on spaces of interpretations, including the Scott topology 
and a topology called the Cantor topology, as already mentioned. The conti- 
nuity of semantic operators in the Scott topology is examined in Chapter 3, 
but the treatment of their continuity in the Cantor topology is deferred until 
we reach Chapter 5, where the results are needed. In fact, as noted earlier, it 
is convergence in these topologies which is of main interest because it can be 
used to find models for logic programs as we show in Chapters 5 and 6, and 
thus, convergence is the dominant theme in our development of topology. 


We take the theme of structures defined on spaces of interpretations yet 
further in Chapter 4 in presenting a detailed account of both various gen- 
eralized distance functions defined on spaces of interpretations and their as- 
sociated fixed-point theorems. These tools, some of which depend on level 
mappings again, are developed specifically for investigating semantic oper- 
ators of logic programs with negation, but we believe that Chapter 4 is a 
self-contained account of results which are likely to have applications within 
computer science outside those areas considered here. In Chapter 5, we com- 
bine the developments of Chapters 2, 3, and 4 by applying the fixed-point 
theorems of Chapter 4 to the more important semantic operators introduced 
in Chapter 2. More specifically, we focus on classes of programs, which we 
call unique supported model classes, each of which has the property that all 
programs in that class have a unique supported model. An example of such 
a class is the class of acceptable programs well-known in termination analy- 
sis, but we examine other important unique supported model classes as well. 
These classes are interesting because it turns out that for each of the programs 
they contain, many of the main semantics studied in the earlier chapters co- 
incide, and hence the meaning of each program in a unique supported model 
class is unambiguous relative to the most important semantics. In essence, 
we obtain these classes by applying to various semantic operators those fixed- 
point theorems of Chapter 4 which guarantee a unique fixed point, if there 
is a fixed point at all. The process involves working with successively more 
general semantic operators, especially Fitting-style operators, and examining 
their properties in relation to single-step operators and convergence of their 
iterates in the Cantor topology studied in Chapter 3. Indeed, the process cul- 
minates in a very general semantic operator T which subsumes many of those 
studied in the earlier chapters, and we estabish many of its important proper- 
ties in Chapter 5. In particular, we examine in depth the continuity of T in the 
Cantor topology, thereby obtaining the corresponding results for single-step 
operators and Fitting-style operators. Finally, we note that the work we do 
in this chapter consolidates the uniform approach provided in Chapter 2, em- 
ploying level mappings, to encompass the additional semantics we introduce 
in Chapter 5. 


Turning now to Chapter 6, our objectives here are twofold. First, we revisit 
the stable model semantics and establish a close connection between the well- 
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known Gelfond-Lifschitz operator GLp and the fixpoint completion fix(P) for 
any normal logic program P by deriving the identity GLp (I) = Tax(p) (I), for 
any two-valued interpretation I, see Theorem 6.1.4. This will make it a simple 
and routine matter to prove many facts about GL p, and hence about the sta- 
ble model, from properties of the single-step operator, including the derivation 
of continuity properties of GLp. Our second objective in Chapter 6 is to re- 
visit stratification and the perfect model and to present an iterative process for 
obtaining the perfect model for locally stratified normal logic programs. This 
approach involves careful control of negation in order to produce monotonic 
increasing sequences by means of non-monotonic operators and is interesting 
for the insight it gives into the structure of the perfect model. In Chapter 7, 
we apply the topological and analytical tools developed earlier in order to 
discuss logic programming in the context of dynamical systems and artificial 
neural networks with a view, in particular, to presenting a detailed account of 
these methods in the foundations of neural-symbolic integration. Specifically, 
in Chapter 7, we consider the computation by artificial neural networks of 
various semantic operators associated with normal logic programs. We view 
this as a means of integrating these two computing paradigms because both 
can be represented by functions: the semantic operator on the one hand and 
the I/O function of the neural network on the other. In fact, exact computa- 
tion of semantic operators is only possible in the case of propositional normal 
logic programs. In the case of first-order programs, approximation methods 
are required, and this is where analytical and topological methods make their 
entrance. Indeed, it turns out that continuity of a semantic operator in the 
Cantor topology is a necessary and sufficient condition for this approximation 
process to work, see Theorem 7.5.3. This observation is yet further motiva- 
tion for studying the Cantor topology, and hence Chapter 7 represents an 
important application of analytical ideas in logic programming semantics. In 
Chapter 8, we give a brief discussion of further possible applications of our 
results and future directions for research involving the methods and results 
of this book. In particular, we discuss possible future work in the context 
of the foundations of program semantics, quantitative domain theory, fixed- 
point theory, the Semantic Web, and neural-symbolic integration, among other 
things. In the Appendix, we bring together a summary of those facts from the 
theory of ordinals and general topology which will be needed at various points 
in our investigations, but are not developed in the main body of the text; its 
inclusion makes our treatment essentially self-contained. In particular, the re- 
sults of Chapter 3 together with those of the Appendix give a treatment of 
the Scott topology in terms of convergence. 

Finally, on a point of convention, we note that the symbol W will be em- 
ployed as an end marker in two ways in the body of the text. First, it will 
be used to indicate the end of every proof. Second, it will be used on a few 
occasions to mark clearly the end of any statement (theorem, proposition, 
definition, remark, example, program, etc.), where the end of that statement 
might otherwise be unclear. 
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Chapter 1 


Order and Logic 


The study of the semantics of logic programs rests on a certain amount of 
order theory and logic, and it will be convenient to collect together here in 
this first chapter those basic facts we need throughout the book to accomplish 
this study.' At the same time, we establish some notation and terminology 
which is common to all the chapters. 


1.1 Ordered Sets and Fixed-Point Theorems 


We start by presenting the minimum amount that we need of the theory 
of ordered sets. In addition, we discuss certain important and well-known 
fixed-point theorems applying to functions defined on ordered sets. In fact, 
the first of these theorems has fundamental applications in the semantics of 
computation in general, as well as in logic programming semantics. 

Let D be a set. Recall that a binary relation E on D is simply a subset C 
of D x D. As usual, the symbol C will be written infix, and hence we write 
x E y rather than (x,y) € E, where x,y € D. Furthermore, we write x C y 
if x E y and «Fy. The relation E on D is called reflexive if, for all x € D, 
we have x C z; it is called antisymmetric if, for all x,y € D, x E y andy E x 
imply x = y; and it is called transitive if, for all x,y,z E€ D,xlyandylz 
imply x E z. We call E a partial order if E is reflexive, antisymmetric, and 
transitive, and in that case we call the pair (D,E), or simply D when E is 
understood, a partially ordered set, a poset, or sometimes a partial order by 
abuse of terminology. We may sometimes simply refer to a partially ordered 
set (D,E) as an ordered set and to the relation E as an ordering (on D). 

Two elements x and y of a partially ordered set D are said to be comparable 
if either x E y or y C x holds; otherwise, x and y are called incomparable. 
A non-empty subset A C D is said to be totally ordered by E or is called 
a chain if any two elements of A are comparable with respect to E, that is, 
given a,b € A, we have a E b or b Ea. A partial order E on D is called a 
total order if D itself is totally ordered by E. We call A an w-chain if A is an 
increasing sequence ao E a; E az ..., where w denotes the first limit ordinal. 


1 The text [Davey and Priestley, 2002] is a useful reference for the subject of ordered sets. 
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(We refer the reader to the Appendix for a brief discussion of the theory of 
ordinals.) We note that any w-chain is, of course, a chain. 

A non-empty subset A of a partially ordered set (D, E) is called directed if, 
for all a,b € A, there is c € A with a E cand b C c. An element b in an ordered 
set D is called an upper bound of a subset A of D if we have a E b for all a € A 
and is called a least upper bound or supremum of A if b is an upper bound of A 
satisfying b E b’ for all upper bounds 0’ of A. Of course, by antisymmetry, the 
supremum, | | A or sup A, of A is unique if it exists. Similarly, one defines lower 
bound and the greatest lower bound or infimum, []A or inf A, of a subset A 
of D. An element x of D is called maximal (minimal) if we do not have x C y 
(y E x) for any element y of D. Given an ordering E on a set D, we define the 
dual ordering E? on D by x C4 y if and only if y E x. Lower bounds, greatest 
lower bounds, etc. in E correspond to upper bounds, least upper bounds, etc. 


in C4, 


1.1.1 Definition Let (D,E) be a partially ordered set. 


(1) We call (D,E) an w-complete partial order or an w-cpo if | | A exists in D 
for each w-chain A in D, and D has an element L, called the least element 
or bottom element, satisfying L C «x for all x € D. 


(2) We call (D,E) chain complete if every chain in D has a supremum. 


(3) We call (D,E) a complete partial order or a cpo if |_| A exists in D for 
each directed subset A of D, and D has a bottom element. 


(4) We call (D,E) a complete upper semi-lattice if | | A exists in D for each 
directed subset A of D, and []A exists for each subset A of D. 


(5) We call (D,E) a complete lattice if || A and []A exist in D for every 
subset A of D. 


Later on, we will encounter examples of each of these notions in the context 
of spaces of valuations. Notice that on taking A = D in the previous definition, 
we see that a complete upper semi-lattice or a complete lattice always has 
a bottom element and that a complete lattice always has a top element or 
greatest element, that is, an element T satisfying a E T for alla € D. 

There are various implications between the notions formulated in Defini- 
tion 1.1.1, some of which are obvious. Indeed, as far as the various notions of 
completeness are concerned, each defined concept is apparently less general 
than its predecessor. For example, since any chain is a directed set, we see that 
any complete partial order is chain complete, and any chain-complete poset 
with a bottom element is an w-complete partial order. However, the following 
fact, which we simply state, is less trivial.? 


?For a discussion of chain completeness versus completeness (for directed sets), we refer 
the reader to [Markowsky, 1976]; see also [Abramsky and Jung, 1994, Proposition 2.1.15]. 
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1.1.2 Proposition A partially ordered set (D, E) is a complete partial order 
if and only if it has a bottom element and is chain complete. 


Many aspects of theoretical computer science depend on the notion of 
a partially ordered set. More structure is often required, however, than is 
provided simply by a partial order or even by a complete partial order or 
complete lattice. For example, one needs extra structure in order to model 
standard programming language constructs or to provide an abstract theory of 
computability, as well as having a satisfactory fixed-point theorem available. It 
is now widely recognized that Scott’s theory of domains provides a satisfactory 
setting in which to attain all these objectives, and we will find it useful later on 
to view spaces of valuations as Scott domains. It will therefore be convenient 
to give next the definition of the term “(Scott) domain” in the form in which 
we will always use it. First, however, we need to define the notion of compact 
element. 


1.1.3 Definition Let (D C) be a partially ordered set. We call an element 
a € D compact or finite if it satisfies the property that whenever A is directed 
and a E | | A, we have a E x for some x € A. We denote the set of compact 
elements in D by De. 


Notice that the bottom element in a complete partial order is always a 
compact element, and hence the set D, is always non-empty in this case. The 
compact elements are fundamental in domain theory. 


1.1.4 Definition A Scott-Ershov domain, Scott domain, or just domain 
(D,E) is a consistently complete algebraic complete partial order. Thus, the 
following statements hold. 


(1) (D,E) is a complete partial order. 


(2) For each x € D, the set approx(x) = {a € De | a E x} is directed, and we 
have x = | |approx(z) (the algebraicity of D). 


(3) If the set {a,b} C De is consistent (that is, there exists x € D such that 
aC x and bE 2), then | |{a,b} exists in D (the consistent completeness 
of D). 


We next give some simple examples of the concepts defined above; note 
that (1) and (2) are special cases of Theorem 1.3.2. 


1.1.5 Example (1) The power set D = P(N) of the set N of natural numbers 
is a complete lattice when ordered by set inclusion. In this ordering, D is 
also a domain in which the compact elements are the finite subsets of N. 
Furthermore, the bottom element of D is the empty set Ø and Ø is also 
the only minimal element of D; the top element of D is N and N is the 
only maximal element of D. 


4 Mathematical Aspects of Logic Programming Semantics 


(2) Let X be a non-empty set, and let D denote the set of all pairs (I+, I7), 
where J+ and I~ are disjoint subsets of X. We define an ordering on D 
by (I*+,I-) E (J+, J7) if and only if It C J+ and I7 C J7. Then D 
is a domain in which the bottom element is the pair (Ø, Ø), the compact 
elements of D are the pairs (Z+, I) in D in which I* and I~ are finite 
sets, and the maximal elements are the pairs (I+, I7) which satisfy I+ U 
I~ = X. Note that D is not a complete lattice. 


(3) Let D denote the set of all partial functions f : N” — N ordered by graph 
inclusion, that is, f E g if and only if graph(f) C graph(g), where f 
and g are partial functions. (Thus, f E g if and only if whenever f(x) 
is defined, so is g(x) and f(x) = g(x).) Then D is a domain in which a 
partial function f is a compact element if and only if graph(f) is a finite 
set and the bottom element is the empty function. Here, the maximal 
elements of D are the total functions. Again, D is not a complete lattice. 


1.1.6 Remark Mathematically speaking, the denotational semantics, or 
mathematical semantics, approach to the theory of procedural and functional 
programming languages is highly involved with providing a satisfactory frame- 
work within which to model constructs made in conventional programming 
languages. Such frameworks must be closed under the formation of products, 
sums, and function spaces and therefore are, simply, Cartesian closed cate- 
gories. One of the most successful Cartesian closed categories to have arisen 
out of these considerations is that of Scott domains,? as formulated in Defini- 
tion 1.1.4. Moreover, most functions and operators encountered within domain 
theory are order continuous, see Definition 1.1.7, and therefore the most useful 
fixed-point theorem in domain theory is Theorem 1.1.9. On the other hand, as 
we shall see in the next chapter and subsequent chapters, a logic program has 
a well-defined and mathematically precise meaning inherent in its very nature, 
namely, its semantics as a first-order logical theory. In addition, certain impor- 
tant operators arising in logic programming are not monotonic in general due 
to the presence of negation, resulting in Theorems 1.1.9 and 1.1.10 often be- 
ing inapplicable, and this has no direct parallel in conventional programming 
language semantics. For these reasons, the semantics of logic programming 
languages has developed rather differently from that of procedural program- 
ming languages. Nevertheless, we shall study domains in Chapter 4, in the 
context of fixed-point theory.* 


If D is a set, A is a subset of D, and f : D — D is a function, then we 
denote the image set {f(a) | a E€ A} of A under f by f(A). We also define 


3See [Scott, 1982b]. 

4Our basic references to domain theory are the book [Stoltenberg-Hansen et al., 1994] 
and the book chapter [Abramsky and Jung, 1994], but the reader interested in domain 
theory may also care to consult the notes of G.D. Plotkin [Plotkin, 1983] and also the 
comprehensive treatment to be found in [Gierz et al., 2003]. 
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iterates of a function f : D — D inductively as follows: f°(x) = x, and 
f(x) = f(f"(a)) for all n € N and z € D. 


1.1.7 Definition A function f : D — E between posets D and E is called 
monotonic if, for all a,b € D with a E b, we have f(a) E f(b). Furthermore, 
f is called antitonic if, for all a,b € D with a E b, we have f(b) E f(a). If 
D and E are w-complete partial orders, then a function f : D — E is called 
w-continuous if it is monotonic and || f(A) = f(L] A) for each w-chain A in 
D. Finally, if D and E are complete partial orders, then f is called (order) 
continuous if, again, it is monotonic and, for every directed subset A of D, we 


have L] f(A) = f(A). 


We note that if f is monotonic, then the image of any w-chain under f 
is an w-chain, and similarly the image of any directed set under f is itself 
a directed set. Therefore, the two suprema required in making the previous 
definition always exist. Indeed, it is easy to see that, equivalently,’ one may 
define f to be continuous by requiring, for each directed set A, that f(A) 
is a directed set and that || f(A) = f(A). In fact, if f is monotonic and 
A is directed, then it is easily checked that the inequality || f(A) E f(A) 
always holds. Therefore, it follows that f is continuous if and only if it is 
monotonic and f(|_|A) E L] f(A) whenever A C D is directed. As a matter 
of fact, preservation of suprema of chains is enough in defining continuity as 
shown by the next result, which again we simply state. We note finally that 
if a function f between complete partial orders is continuous, then it is clear 
that it is w-continuous as a function between w-complete partial orders. 


1.1.8 Proposition A function f : D — E between complete partial orders is 
continuous if and only if it is monotonic and |_| f(A) = f(L] A) for each chain 
Ain D. 


We define ordinal powers of a monotonic function f on a complete partial 
order (D,E) inductively as follows: f 10 = L, ff (a +1) = f(fTa) for any 
ordinal a, and ffa =| HFT |8 < a} if a is a limit ordinal. Noting that 
(D,E) is chain complete, being a complete partial order, it is straightforward 
using transfinite induction to see that f1 8 E fT a whenever 6 < a, and 
hence that ordinal powers of f are well-defined. More generally, the same 
comments apply to the ordinal powers f(x) for any x € D which satisfies 
x E f(x): we define f°(xz) = x, f+! (x) = f(f%(x)) for any ordinal a, and 
f°(a) = LIL FP (x) | 8 < a} if a is a limit ordinal. 

A fized point of a function f : D — D is an element x € D satisfying 
f(x) = x. A pre-fized point of a function f on a poset (D,E) is an element 


y € D satisfying f(y) E y. Finally, a post-fixed point of f is an element yEeD 
satisfying y E f(y). The least fixed point, lfp(f), of f is a fixed point x of f 


5This is the definition adopted in [Stoltenberg-Hansen et al., 1994]. 
6A discussion of the various ways of formulating the notion of continuity is to be found 
in [Markowsky, 1976]. 
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satisfying the property: if y is a fixed point of f, then x E y. Least pre-fixed 
points and least post-fixed points are defined similarly. 

The following two theorems are fundamental in handling the semantics of 
logic programs.” Indeed, the first of them, which is frequently referred to as the 
fixed-point theorem, is fundamental in procedural and functional programming 
as well.® 


1.1.9 Theorem (Kleene) Let (D,E) denote an w-complete partial order 
and let f : D — D be w-continuous. Then f has a least fixed point x = ftw 
which is also its least pre-fixed point. 


Proof: We sketch the proof of this well-known result. 

The sequence (f fT 7)nen is an w-chain. It therefore has a supremum f fw = 
x, say. By w-continuity, we have x = ffw=([|{ft (n+l) |n EN} = f(L{ft 
n |n € N}) = f(x), and so z is a fixed point of f. If y is a pre-fixed point of 
f, then L E y, and, by monotonicity of f, we obtain ffl = f(L) E f(y) Ey. 
Inductively, it follows that f În E y for all n € N, and hence z = ftwLF y. 
So z is the least pre-fixed point of f and hence also its least fixed point. E 


By our earlier observation that a continuous function is w-continuous, this 
theorem applies, of course, to continuous functions on complete partial orders. 
Moreover, if the function is not w-continuous, but is monotonic, the existence 
of a least fixed point can still be guaranteed, as we see next.’ 


1.1.10 Theorem (Knaster-Tarski) Let (D,E) denote a complete partial 
order, let f : D — D be monotonic, and let x € D be such that x E f(x). 
Then f has a least fixed point a above x, meaning x C a, which is also the 
least pre-fixed point of f above x, and there exists a least ordinal a such that 
a = f(x). In particular, f has a least fixed point a which is also its least 
pre-fixed point. 


Proof: Again, this theorem is well-known, and we just sketch its proof. 

Let y be an ordinal whose cardinality exceeds that of D, and form the 
set { f(x) | 8 < y}. By cardinality considerations, there must be ordinals 
a< B <y with f(x) = fP(£), and we can assume without loss of generality 
that a is least with this property. Since f%(x) E f(f*(x)) E f8(x) = f%(a), 


TFixed points of certain operators associated with logic programs are of extreme impor- 
tance in the semantics of logic programs, as we shall see in later chapters. 

8A result similar to Kleene’s theorem, in fact, equivalent to it, is the well-known theorem 
due to Tarski and Kantorovitch in which w-chains are replaced by countable chains, see 
[Jachymski, 2001]. Indeed, the collection containing [Jachymski, 2001] is an excellent general 
reference to fixed-point theory. As noted in [Lloyd, 1987], the reference [Lassez et al., 1982] 
contains an interesting discussion of the history of fixed-point theorems on ordered sets. 

9In attributing Theorem 1.1.10 to Knaster and Tarski, we are noting Proposition 1.1.2 
and then following Jachymski in [Jachymski, 2001]. Theorem 1.1.9 is usually attributed to 
Kleene, since this theorem is an abstract formulation of the first recursion theorem, and we 
are consistent with [Jachymski, 2001] in this respect. 
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we obtain that f(x) = f(f%(x)), and so a = f(z) is a fixed point of f. 
Clearly, we have x C a. Furthermore, if b is any pre-fixed point of f with 
x CE b, then by monotonicity of f and the fact that f(b) E b we obtain 
f?(x) E b for all ordinals 3. Hence, a E b, and so a is both the least pre-fixed 
point and the least fixed point of f above zx. 

To obtain the final conclusion, we simply set x = L and note then that 


uC f(a). a 


Note that, in particular, the least fixed point of f is equal to f fî a for 
some ordinal a. We call the smallest ordinal a with this property the closure 
ordinal of f. 

One other point to make in this context is that Kleene’s theorem shows 
that w-continuity ensures that in finding a fixed point the iteration will not 
continue beyond the first infinite ordinal w. This contrasts with the Knaster- 
Tarski theorem, where it may be necessary to iterate beyond w if one only has 
monotonicity of the operators in question. This is a significant point in rela- 
tion to computability considerations and explains the importance of Kleene’s 
theorem in the theory of computation. 


1.2 First-Order Predicate Logic 


We assume that the reader has a slight familiarity with first-order predicate 
logic, but for convenience we summarize next the elementary concepts of the 
subject, beginning by formally describing its syntax. t° 


1.2.1 Syntax of First-Order Predicate Logic 


As usual, an alphabet A consists of the following classes!! of symbols: 
a (possibly empty) collection of constant symbols a,b,c, d,...; a non-empty 
collection of variable symbols u,v, w, x,y, Z,...; a (possibly empty) collection 
of function symbols f,g,h,...; and a non-empty collection of predicate sym- 


10Our approach to the syntax and semantics of first-order logic is standard and is to be 
found in any of the well-known texts on mathematical logic, see, for example, [Hodel, 1995, 
Mendelson, 1987]. For fuller details of logic in relation to logic programming, the reader 
may care to consult [Apt, 1997] or [Lloyd, 1987]. 

11Similarly, our use of classes in the definition of an alphabet is also standard in developing 
first-order logic and, in our case, is not intended to hint at foundational issues. In logic 
programming practice, the classes referred to, namely, those of constant, variable, function, 
and predicate symbols, will be finite sets. When working with the set ground ;(P) defined in 
Chapter 2, J will usually (although not necessarily) denote the Herbrand preinterpretation, 
and then we will in effect be working with a set containing a possibly denumerable collection 
of elements (atoms, in fact). 


8 Mathematical Aspects of Logic Programming Semantics 


bols!? p,q,r,.... In addition, we have the connectives ~, A, V, >, and ©; the 
quantifiers Y and 4; and the punctuation symbols “(”, “)” and “,”. The arity 
of a function symbol f or of a predicate symbol p is commonly denoted by 
#(f) or by #(p). 

In the following four definitions, we assume that A denotes some fixed, 
but arbitrary, alphabet. 


1.2.1 Definition We define a term (over) A inductively!? as follows. 
(1) Each constant symbol in A is a term. 
(2) Each variable symbol in A is a term. 


(3) If f is any n-ary function symbol in A and tı,...,tn are terms, then 
f(ti,..-,tn) is a term. 


A term is called ground if it contains no variable symbols. 
1.2.2 Definition An atom, atomic formula, or proposition A (over A) is an 


expression of the form p(t1,...,t,), where p is an n-ary predicate symbol in 
A and t;,...,t, are terms (over A). 


1.2.3 Definition A literal L is an atom A or the negation ~A of an atom 
A. Atoms A are sometimes called positive literals, and negated atoms ~A are 
sometimes called negative literals. 


1.2.4 Definition A (well-formed) formula (over A) is defined inductively as 
follows. 


(1) Each atom is a well-formed formula. 


(2) If F and G are well-formed formulae, then so are ~F, FAG, FVG, 
F—-G,andFoG. 


— 


(3) 


f F is a well-formed formula and x is a variable symbol, then VaF and 
xF are well-formed formulae also. 


WwW 


A well-formed formula is called ground if it contains no variable symbols. 
Thus, in particular, a ground atom is an atom containing no variable symbols. 


Of course, brackets are needed in writing down well-formed formulae to 
avoid ambiguity. Their use can be minimized, however, by means of the cus- 
tomary precedence hierarchy (in descending order) in which ~, Y, 3 have high- 
est precedence, followed by that of V, followed next by the precedence of ^, 
and finally followed by — and + with the lowest precedence. 


12Constant symbols, variable symbols, function symbols, and predicate symbols are some- 
times referred to as simply constants, variables, functions, and predicates, respectively 

13 As usual, in giving inductive definitions of sets, we omit the explicit statement of the 
closure step and assume that what is being defined is the smallest set satisfying the basis 
and induction steps. 
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1.2.5 Definition The first-order language L given by an alphabet A consists 
of the set of all well-formed formulae determined by the symbols of A. We refer 
to terms over A as terms in or over L. 


1.2.6 Example Suppose we are given an alphabet A containing constant 
symbols a and b; variable symbols xz and y; a unary function symbol f 
and a binary function symbol g; and a unary predicate symbol p and a bi- 
nary predicate symbol q. Then the following are examples of terms over A: 
a, b, x, y, f(a), F(x), gla, FŒ), 9(9(a,), Fy)), Flg(a,b)), .... In particular, 
we note that, for example, f(a) and g(a, f(b)) are ground terms, whereas 
f (g(a, b)) is not. 

Furthermore, the following are examples of well-formed formulae in the 
first-order language £ determined by A: p(a), g(a, g(b, b)), >p(x), q(x, g(a, y)), 
q(x, g(a, y))V (p(y) A>p(x)), p) — p(F(a)) Aa(F (0), glx, F(y))) Agl, gly, b)), 
p(x) = q( f(x), g(@, x), Vz(p(z) — pla) A =4( f(b), g(a, F(x))) A a(a, g(x, b))). 
In particular, the last of these is in a form of great significance in logic pro- 
gramming. Moreover, p(a) and q(a, g(b, b)), for example, are ground (atomic) 


formulas, whereas VaVy(p(x) — p(a) A ~4(f (b), g(x, f(y))) A a(z, g(y, b))) is 
not ground. 


1.2.2 Semantics of First-Order Predicate Logic 


The definition formally describes the syntax of first-order predicate logic. 
We want now, briefly, to describe formally the semantics or meaning given to 
well-formed formulae. In doing this, we adopt the usual set-based approach 
from model theory, but with two caveats which direct us. The first is that we 
do need to handle more truth values than just the two conventional ones. The 
second is that we do not usually need to handle quantified formulae because, 
for purposes of the semantics of logic programs P, we usually consider the set 
ground(P), as defined in Chapter 2, instead of P itself, and elements of the 
former contain no variable symbols and no quantifiers. However, in order to 
proceed further it is necessary to discuss spaces of truth values, and we do 
this next. 


In classical two-valued logic and almost always in mathematics it is usual 
to employ the set TWO = {f,t} of truth values false f and true t. How- 
ever, in many places in logic programming and in other areas of comput- 
ing, it has been found advantageous to employ more truth values than these. 
Indeed, quite early on, Melvin Fitting argued in several places for the use 
in logic programming of Kleene’s strong and weak three-valued logics, see 
[Fitting, 1985, Fitting and Ben-Jacob, 1990], for example, in which the truth 
set is THREE = {u,f, t}. Here, f denotes false and t denotes true, again, but 
u denotes a third truth value which may be thought of as representing under- 
defined, none (neither true nor false) or no information, or, in some contexts, 
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non-termination.'* These and other three-valued logics will be encountered 
in Chapter 2 and in many other places in Chapters 3, 5, 6, and 7. 

Fitting also considered Belnap’s four-valued logic!’ in which the truth set 
is FOUR = {u,f,t,b}. Here, b denotes a fourth truth value intended to 
represent both true and false, both or overdefined, which, it can be argued, 
should be used to handle the conflicting information “both true and false” 
returned, perhaps, in a distributed logic programming system. On a point of 
notation, we remark that the listing of the elements in TWO corresponds to 
the truth ordering <+, as defined in Section 1.3.2, and in the case of THREE 
and FOUR the listing is derived from the knowledge ordering <x, see again 
Section 1.3.2, with incomparable elements listed alphabetically. 

A fundamental concept throughout this work is that of valuation, or in- 
terpretation, and also that of model. Indeed, spaces of interpretations are one 
of the central concepts here when viewed as the carrier sets for various se- 
mantic operators determined by programs. We will usually work later on in 
the truth sets TWO and THREE and sometimes in FOU R. Nevertheless, in 
formulating the concepts of valuation and interpretation, we will work quite 
generally, at no extra cost, and allow arbitrary sets of truth values and cer- 
tain connectives defined on them. Thus, let 7 denote an arbitrary set of truth 
values or truth set containing at least two elements, one of which will be the 
distinguished value t, denoting true. We assume further that certain binary 
connectives, namely, conjunction (A) and disjunction (V) are given, together 
with a unary connective negation (~), as functions over T. A third binary 
connective implication (—) may also be given or it may be defined in terms 
of the other connectives, and the latter is the way we will usually handle im- 
plication. However, we will defer giving the definition of implication we want 
until we have dealt with orderings on truth sets, see Definition 1.3.3. A set 
T together with specified definitions of these connectives will be referred to 
as a logic and, when the definitions of the connectives are understood, will 
be denoted simply by the underlying truth set 7 without causing confusion. 
Quite often, the definitions of A, V, and — are given by means of a truth 
table, and this is the case for most of the logics we encounter here. For ex- 
ample, Table 1.1 specifies Belnap’s logic as employed by Fitting and by us. 
It contains classical two-valued logic and Kleene’s strong three-valued logic 
as sublogics.!6 Moreover, FOUR is a complete lattice, as we see later, and is 
therefore technically easy to work with. Indeed, these are some of the reasons 
why four-valued logic plays an important unifying role in the theory!” and is 


14The truth value u is sometimes denoted in the literature by n, indicating none. 

15We refer to [Belnap, 1977, Fitting, 1991, Fitting, 2002], but note that Fitting worked 
with a minor variant of the logic defined in [Belnap, 1977]; we work with this same variant 
of Belnap’s definition. 

16The term a sublogic S of a logic T means that S is a subset of the set T of truth values, 
and the connectives in S are restrictions to S of the corresponding connectives in T. 

17 Fitting has shown the utility of FOUR, when viewed as a bilattice, in giving a unified 
treatment of several aspects of logic programming, and we refer the reader to [Fitting, 2002] 
and the works cited therein for more details. 


Order and Logic 11 


TABLE 1.1: Belnap’s four-valued logic. 


=p pAq pVq 
u 


oni a a O aa a Oa air 


ooa aioa a a r ar a e me 
o'e o ee e ee ee o e aea 


Sooo e e etja mhae e e cls 
Soo olh mh ma meee 


the main reason we work with it despite the fact that most of our applications 
are to TWO and THREE. Notice that Kleene’s weak three-valued logic also 
uses the truth set THREE, but its connectives are defined by Table 5.1.18 


The next two definitions are fundamental. In presenting the first of them, 
we will use the notation commonly employed in logic programming. 


1.2.7 Definition Let £ be a first-order language and let D be a non-empty 
set. A preinterpretation J for L with domain D (of preinterpretation) is an 
assignment -” which satisfies the following: (1) c? € D for each constant 
symbol c in £, and (2) f7 is an n-ary function over D for each n-ary function 
symbol f in £L. A J-variable assignment is a (total) mapping, 0, say, from 
variable symbols to elements of D. 

Given a preinterpretation J with domain D and a J-variable assignment 
0, we can assign to each term t in £ an element of D, called its denotation or 
term assignment, inductively as follows: (t0)? = 0(t) if t is a variable symbol, 
(t0)’ = t” if t is a constant symbol, and (t0)7 = f7 ((t10)7,...,(tn0)7) if 
t = f(ti,...,tn) for some n-ary function symbol f and terms ¢1,...,¢n. For 
an atom A = p(t;,...,tn), say, in the language £, we define (A0)? to be the 
symbol p ((t10)”, er (tn)7) and call this a J-ground instance of the atom 
p(ti,...,tn). We denote by Br; the set of J-ground instances of atoms in £. 


18Indeed, disjunction and conjunction in Kleene’s weak three-valued logic are given by 
V2 and ^3, respectively, in Table 5.1, see also [Fitting, 1994a]. 
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Thus, Bzz is the set of all symbols p(di,...,d,), where p is an n-ary predicate 
symbol in £ and dj,...,dy, E€ D. a 


1.2.8 Definition Let £ be a first-order language, let J be a preinterpretation 
for £ with domain D, and let T be a logic. A valuation or interpretation for 
L (based on J) with values in T is a mapping v : Bc,y > T. Let v : Bez > T 
be a valuation and let 0 be a J-variable assignment. Then v and 0 determine, 
inductively, a well-defined truth value in 7 for any quantifier-free, well-formed 
formula F in £ by means of the construction of F and the definitions of the 
connectives in 7. We say that v is a model for F, written v } F, if v gives 
truth value t to F. We sometimes refer to valuations, interpretations, and 
models based on J as J-valuations, J-interpretations, and J-models. 


In fact, if J is ordered as a complete lattice (and this issue will be con- 
sidered shortly), then a valuation v gives unique truth value in J, in the 
standard way, to any closed well-formed formula F in £: universal quantifi- 
cation corresponds to the infimum of a set of truth values, and existential 
quantification corresponds to the supremum of a set of truth values. The term 
closed here has, of course, its normal meaning in mathematical logic, namely, 
that each variable symbol occurring in F falls within the scope of a quantifier. 
(By default, we allow the term closed to apply to formulae with no variable 
symbols and no quantifiers.) Once this observation is made, one can go on, in 
the standard way, to define at our present level of generality the terms model, 
(un) satisfiable, valid, and logical consequence when applied to sets of closed 
well-formed formulae. 


1.3 Ordered Spaces of Valuations 


Following Definition 1.2.8, we will generally denote the set of all valua- 
tions for £ based on J with values in 7 by I(Bc,7,T), and we will consider 
I(B£ 3, T) as an ordered set. The orderings we have in mind are derived from 
orderings on 7, and the set Be,; plays no role in this. Therefore, to ease 
notation we will work with an arbitrary set X for the rest of this chapter. 
Thus, we regard a valuation or interpretation for the time being as simply a 
mapping X — T and denote the set of all these by I(X,T); typical elements 
of I(X,T) will be denoted by u,v, etc. Later on, in applying the results of this 
section, we will of course take X to be a set of ground atoms or of J-ground 
instances of atoms, and no confusion will be caused. There is, however, a con- 
vention we need to establish concerning the terminology “valuation” versus 
“interpretation”, as follows. 
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1.3.1 Remark Much of the theory of logic programming semantics is con- 
cerned with sets of valuations. It is important, therefore, to have convenient 
notation for valuations and to have ways of representing them, which both 
facilitate discussion and also allow easy passage backwards and forwards be- 
tween the different representations employed. There are three ways of han- 
dling valuations, which are commonly used in the literature on the subject 
and which we adopt also. Having these three forms available will, in certain 
places, greatly increase readability and reduce technical difficulty. 

First, when considering general structures such as orderings or topologies 
on I(X,T), the easiest way is to think of valuations as mappings, and this 
we will usually do. Thus, in the main, our future use of the term valuation 
will refer to mappings whose domain is a set of atoms (or ground instances of 
atoms) and whose codomain is a set 7 of truth values. 

Second, when 7 is a small set containing two, three, or four elements, 
say, it is convenient to identify a valuation with the (ordered) tuple of sets 
on which it takes the various truth values in 7, as discussed in Section 1.3.2. 
This is by far the most frequently used representation, and, in common with 
most authors, we will in future usually employ the term interpretation when 
thinking in these terms. Thus, as we progress, more and more we employ the 
terminology interpretation instead of valuation, use the standard notation J, 
K, etc. to denote interpretations, and adopt the notation described at the end 
of Section 2.1 for sets of interpretations. 

Third, there is yet another representation frequently used for interpreta- 
tions when T is the set THREE, namely, signed sets as discussed in Sec- 
tion 1.3.3. This form is particularly expressive, as we shall see in Chapter 2, 
when one wants to discuss the truth value of conjunctions of literals in relation 
to THREE. 


1.3.1 Ordered Spaces of Valuations in General 


Usually, the set 7 of truth values carries an order, <, in which (T, <) 
is perhaps a complete partial order, complete upper semi-lattice, complete 
lattice, or Scott domain, with bottom element L, say, or even a bilattice!® 
when equipped with two compatible orderings. When 7 carries an ordering, 
<, we can define the corresponding pointwise ordering on I(X, T), denoted 
by E, in which vı E vo if and only if vi (a) < ve(x) for all x € X. 

It is routine to check that the ordering E is in fact a partial order if < is 
one. Moreover, if 7 has a bottom element, L, then the valuation which maps 
each x in X to L serves as a bottom element in I(X,7), and we may denote 
this valuation simply by L again, without causing confusion. Finally, if (T, <) 
is a Scott domain, we shall say that a valuation v in I(X,T) is finite if v(x) is 


19A (complete) bilattice is a set D carrying two partial orders in each of which D is a 
(complete) lattice. In addition, the two orderings are required to interact with each other 
so as to obtain various distributive laws. 
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a compact element in (T,<) for each x € X, and the set {x € X | v(x) # L} 
is finite. 

The structural properties of I(X,7) may be summarized in the following 
result.2° 
1.3.2 Theorem Let X be a non-empty set, let (J,<) be an ordered set of 
truth values with bottom element L, and let I(X,T) be endowed with the 
pointwise ordering and bottom element just defined. 


(a) If (J, <) is a partially ordered set, then so is I(X,T). 

(b) If (J,<) is an w-complete partial order, then so is I(X,T). 

(c) If (J, <) is a complete partial order, then so is I(X,T). 

(d) If (J,<) is a complete upper semi-lattice, then so is I(X,T). 

(e) If (J, <) is a complete lattice, then so is I(X, T). 

(£) If (T, <) is a Scott domain, then so is [(X,T). In this case, the compact 


elements of I(X,T) are the finite valuations. 


Proof: (a) As already noted, it is routine in this case to verify that the ordering 
on I(X,T) is a partial ordering, with bottom element as already specified. 

(b) The argument in this case is similar to the next and is omitted. 

(c) If M C I(X,T) is directed, then it is easy to check that, for each 
x € X, the set {v(x) | v € M} is directed and hence has a supremum in T. 
It is now clear that the valuation vj, defined on X by vm(x) = ||{v(x) | 
v € M} is the supremum, | |M, of M in I(X,T). Indeed, for any directed 
subset M C I(X,T), | |M satisfies the following relationship: for each x € X, 
(U M)(x) =LI(M(x)), where M(x) denotes the set {v(x) | vu € M}. 

(d) By the argument used in (c), the supremum | | M exists for any directed 
subset M of I(X,T). Also, for any subset M of I(X,T), we have that []M 
exists and is defined by ([]M)(x) = [|(M(x)) for each x € X, where again 
M(x) denotes the set {v(x) | v € M}. 

(e) It is clear from the argument in (c) that any subset M of I(X,T) has 
a supremum in I(X,T), and, from (d), M has an infimum in I(X,7T). 

(£) We begin by showing that the finite valuations are compact elements. 
Suppose that v is a finite valuation and that {x € X | v(x) # L} = 
{£1,..., 2n}. Suppose that M = {uz | k € K} is a directed set of valuations 
in I(X,T) such that v E | | M. Let x; be an arbitrary element of {£1,..., En}. 
Then we have that v(a;) < |] M(x:) = LI(M(zx:)), that v(x;) is a compact 
element, and that {ug(x:);k € K} is directed. Therefore, there is ug, € M 
such that v(x;) < ug; (xi), and we obtain such uz, for i = 1,...,n. Since M is 
directed, there is u € M such that ug, E u for i = 1,...,n, and it now clearly 
follows that v E u. Hence, v is compact. 


20For further details here and in the next three subsections, see [Seda, 2002]. 
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In the converse direction, suppose that u is any valuation on X. Let M 
denote the set of all finite valuations v such that v E u. Let v1,v2 E€ M 
and suppose that x € X is such that not both vı(x) and ve(a) are equal to 
the bottom element (there are only finitely many such z, of course). Noting 
that approx(u(2)) in T is directed, that v1(x),v2(x) € approx(u(x)) and by 
considering one-point valuations (namely, those valuations w such that w(x) 
is not equal to the bottom element at at most one value of x), we see that 
there is v3(a) € approx(u(a)) such that both vı(x) < v3(x) and va(x) < va(x). 
It follows that there is an element v3 of M such that vı E v3 and v2 E v3 and, 
hence, that M is directed. Moreover, given x € X and any a € approx(u(z)), 
let v? denote the one-point valuation which satisfies v?(x) = a and vē (y) = L 
for all y # x. Then v? € M, and || {v2 (x) | a € approx(u(x))} = u(x). Thus, 
[| M=u. 

It now follows from the observations just made that if u is compact, then 
there is v € M such that u E v, and hence the set {x € X | u(x) # L} is finite. 
We claim that u(x) is a compact element in (T, <) for each x € X. Suppose 
otherwise, that is, that there is 79 € X with u(xo) non-compact in (T,<). 
Then there is a directed set N in T with u(xo) < |_| N for which there is no 
n E€ N with u(ao) < n. Define the family N, consisting of the elements un of 
I(X,T),n € N, by setting un (x) = u(x) for all x £ xo and setting Uun (£o) = n. 
Then Nn is directed and u E [|| {un | n € N}, yet we do not have u E un 
for any n € N. This contradicts the fact that u is a compact element, and 
hence, for each x € X, u(x) is a compact element. Thus, the compact elements 
are indeed the finite valuations, and, moreover, we now see that approx(w) is 
directed and that | | approx(u) = u for each valuation u € I(X,T). 

Finally, if uw; and ug are two consistent finite elements in I(X,7), then 
the valuation v defined by v(x) = | |{ui(x), u2(x)}, for each x € X, is the 
supremum of uw; and ug (and is, in fact, a finite element). This completes the 
proof. a 


1.3.2 Valuations in Two-Valued and Other Logics 


The most prominent declarative semantics for logic programs employ clas- 
sical two-valued logic, three-valued logic, or, to a lesser extent, four-valued 
logic. The corresponding truth sets 7 for these logics are TWO, THREE, 
and FOUR, as already discussed. We examine these cases next in some de- 
tail in light of Theorem 1.3.2 and also introduce some convenient notation 
for these special cases. We begin by considering the orderings involved on the 
three sets of truth values that we are currently discussing. 

In the case of classical two-valued logic, the ordering usually taken is the 
truth ordering. This is the partial ordering <+ satisfying f <, t and is often 
denoted just by <; it turns TWO into a complete lattice with f as the bottom 
element. 

For three-valued logic, there are two natural orderings usually considered: 
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FIGURE 1.1: Hasse diagrams for THREE (left) and FOUR (right). 


the knowledge ordering <p and the truth ordering <+. The first of these, <x, 
is the partial order indicated by the Hasse diagram to the left in Figure 1.1 in 
which u is the bottom element. This ordering turns 7TH REE into a complete 
upper semi-lattice, but not a complete lattice. The second ordering, <+, is 
the partial ordering satisfying f <, u and u <; t; it turns THREE into a 
complete lattice with f as the bottom element. 

Finally, on FOUR, there are again the two orderings: <;, the knowledge 
ordering, and <+, the truth ordering. They are indicated by the Hasse diagram 
on the right-hand side of Figure 1.1. In each of them, FOUR is a complete 
lattice and indeed is a complete bilattice, with bottom elements as indicated 
by the Hasse diagram. 

At this point, having defined the orderings we want on FOUR, it will be 
convenient to record the definition we use of implication before resuming the 
study of orderings on valuations. Note that the definition reduces to mate- 
rial implication in two-valued logic and gives the definition we want later for 
Kleene’s strong three-valued logic. 


1.3.3 Definition For all truth values tı and t2 in FOU R, we define implica- 
tion by taking the truth value of tı — tə to be f if and only if tı <ç t2 in the 
truth ordering <+, and t otherwise. 


In each of the three cases we are considering, the truth set 7 is easily seen 
to be a Scott domain in the truth ordering and also in the knowledge ordering 
in the latter two cases. Furthermore, each element is compact. Therefore, on 
applying Theorem 1.3.2 with the induced pointwise orderings involved, we 
obtain the following result, which summarizes the previous discussion. 
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1.3.4 Theorem Let X be an arbitrary set. Then the following statements 
hold. 


(a) In case T is the truth set TWO, the set I(X,T) is a complete lattice in 
the ordering C4. 


(b) In case T is the truth set THREE, the set I(X,T) is a complete up- 
per semi-lattice in the ordering Ez, but not a complete lattice, and is a 
complete lattice in the ordering C+. 


(c) In case T is the truth set FOUR, the set I(X,T) is a complete lattice in 
each of the orderings CE, and C4. 


Furthermore, in each case and in each ordering, the set I(X,7) is a Scott 
domain whose compact elements are precisely those valuations v for which 
the set {a € X | v(x) Æ L} is finite, where L denotes the appropriate bottom 
element. | 


Notice that the order structure here is independent of the actual logic in- 
volved, as distinct from the underlying truth set. Thus, for example, Kleene’s 
strong and weak three-valued logics give rise to precisely the same order struc- 
ture on [(X,THREE); the difference between them is in the definitions of 
the connectives, rather than in their order structure. 

We next take up the point made in Remark 1.3.1, concerning the repre- 
sentation of a valuation in terms of the sets on which it takes various truth 
values in TWO, THREE, or FOUR. 

Let v be a valuation, and let vu = v~1(u), let ve = v1(f), let ve = v(t), 
and let vp = v7! (b); these sets are pairwise disjoint subsets of X, and some 
may be empty. A valuation v taking values in TWO is clearly completely 
determined by the subset J = vu, of X and therefore can be identified with 
I. A valuation taking values in THREE can be identified either with the 
pair I = (vz, ve) of subsets of X or with the pair I = (v4,vy). The former 
choice will be made when we are concerned with the ordering Ex, so that 
the bottom element is u and this is also the “default” value in the sense that 
Vu = X \(v~U ve). The latter choice will be made when we are concerned with 
the ordering C+, so that the bottom element is f and this is also the default 
value in that vg = X \ (vs U vu). Finally, a valuation v with values in FOUR 
can be identified either with the triple I = (v¢, vf, vp) of subsets of X when 
u is the bottom and default value or with the triple I = (vt, vu, Up) when f is 
the bottom and default value. 

Conversely, a subset I of X determines a valuation v : X — TWO with 
the property that v(x) = t if and only if x € I. Given the ordering Ex, a pair 
I = (I, I) of disjoint subsets of X determines a valuation v : X > THREE 
which takes value t on i, takes value f on J¢, and, by default, takes value u 
on X \ (Ur). Similarly, given the ordering E+, a pair I = (Jt, Iu) of disjoint 
subsets of X determines a valuation v : X — THREE which takes value t on 
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I,, takes value u on Jy, and, by default, takes value f on X \ (4U Tua). Precisely 
the same remarks apply to triples J = (Jt, Ig, Ip) and to triples J = (It, Iu, Ip) 
in relation to valuations v : X — FOUR. 

This passage between mappings and tuples of subsets will often be made 
without explicit mention. However, as noted in Remark 1.3.1, we will, in the 
main, use the term valuation to refer to mappings and the term interpretation 
to refer to tuples of sets, and it will be convenient to employ the following 
terminology. 


1.3.5 Definition A valuation or interpretation taking values in TWO, 
THREE, or FOUR will be called two-valued, three-valued, or four-valued, 
respectively. 


The identification above, of valuations with tuples of sets, carries the point- 
wise ordering of valuations over to the “pointwise” ordering of interpretations, 
and we employ exactly the same notation for the orderings in the correspond- 
ing cases. We obtain the following result, whose proof is straightforward and 
will be omitted. (There is the possibility of confusion here unless one remem- 
bers that coordinate positions in the tuples are labelled with truth values and 
that the truth value not present is the default value. Thus, for example, in 
the case of three-valued valuations, the two coordinate positions are either 
ordered with t and f in that order or ordered with t and u in that order, 
and similarly for four-valued valuations. The only way to avoid this minor 
irritation is to use pairs of sets to represent two-valued valuations, triples of 
sets to represent three-valued valuations, and quadruples of sets to represent 
four-valued valuations. However, this is not customarily done.) 


1.3.6 Theorem The following statements hold in relation to interpretations 
on X. 


(a) If I and K are two-valued interpretations, then J E, K if and only if 
I C K as subsets of X. The bottom element for the set of two-valued 
interpretations is given by the empty set, Ø. 


(b) If I and K are three-valued interpretations, then J Ex K if and only 
if h C Ky and Ig C Ke. Also, I CG, K if and only if  C Ky and 
Kr C Ir. In both orderings, the bottom element for the set of three-valued 
interpretations is given by the appropriate pair (Ø, Ø). 


(c) If I and K are four-valued interpretations, then J Eg K if and only if 
h C Ky U Kp, It C Ke U Kp, and Ip C Kp. Also, I G; K if and only 
if h C Ky, Iua C Ky U Ky, and Ip C Ky U Ky. In both orderings, the 
bottom element for the set of four-valued interpretations is given by the 
appropriate triple (0,0, Ø). a 
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Notice that the difference in the form of the statements in (b) and (c) in 
Theorem 1.3.6 concerning the truth ordering E+ results from the fact that C+ 
is a total order in (b), but it is not a total order in (c). 

In all cases we are currently considering, except one, we are working in a 
complete lattice. Hence, the valuation mapping each element of X to the ap- 
propriate top element is itself a top element. The one exception is the case of 
three-valued interpretations in the order Ex. In this case, it is clear that those 
interpretations I = (I+, I¢) for which i, U Ig = X are maximal elements for 
the ordering Ex. Moreover, each maximal element I = (I4, Ig) gives rise to the 
two-valued interpretation 14, and, conversely, each two-valued interpretation 
I gives rise to a maximal three-valued interpretation (I, X \ I). Moreover, this 
correspondence is evidently one-to-one. Thus, the two-valued interpretations 
can be thought of as maximal three-valued interpretations. Indeed, the max- 
imal elements are called total interpretations, while the remaining elements 
are called partial interpretations. 


1.3.3 Signed Sets and Three- Valued Interpretations 


As mentioned in Remark 1.3.1, there is an alternative and useful way of 
thinking of three-valued interpretations relative to the ordering Ex (so that u 
is the current default value in the representation of interpretations as pairs of 
sets), and we consider it next. 

Let X denote an arbitrary set, and form the set ~X of symbols ~< for 
x € X. If X happens to be a set of atoms or of literals, then ~g is meaningful; 
otherwise, we are working formally. In any case, we assume that x and 72% 
are never equal. Given a subset I of X, we let =J denote the subset of =X 
consisting of those =z for x € I. A subset of X U =X is called a signed subset 
of X and is called consistent if it does not contain both x and 72 for any z. 
Clearly, any signed subset of X has the form J+ U-I~, where J+ and I~ are 
subsets of X, and is consistent if and only if J* and I~ are disjoint. 

Every consistent signed subset I = I* U ~I of X gives rise to the three- 
valued interpretation (I*,J~). Then, thinking of J as this three-valued inter- 
pretation, we have I = It = {x € X |x eT} and lp =I- = {rE X | 
ax € I}. Conversely, every three-valued interpretation I = (It, Ie) = (I+, I~) 
gives rise to the consistent signed subset J+ U ~I- of X. Moreover, this cor- 
respondence is evidently one-to-one, and so I(X, THREE) can be identified 
with the set of all consistent signed subsets of X, and we will quite frequently 
use this fact later on without further notice. Indeed, in this representation, 
we have J Ek K if and only if I~ U-I~ C K+ U7K~, and so Ex corresponds 
to subset inclusion of signed subsets, and, furthermore, the bottom element is 
the empty set thought of as a consistent signed subset of X. 

Now let X denote a set of atoms in a first-order language £, and let I be 
a three-valued interpretation viewed as a consistent signed subset of X. For 
a literal L = A, where A is an atom, we write L € J if A € I, and we write 
aL € I if ~A € I. Similarly, if L = 4A, we write L € I if ~A € I, and we 
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write =L € I if A € I. Using these observations, we now say that a literal L 
is true in I if L € I, that L is false in J if =L € I, and that L is undefined 
in I otherwise. Notice that these facts depend on, and indeed are equivalent 
to, defining the negation operator ~ from THREE into itself by means of 
Table 1.1, so that a(t) = f, =(f) = t, and =(u) = u. 

Finally, we note that four-valued interpretations can be treated in the same 
sort of way as we have just handled three-valued interpretations by including 
inconsistent signed sets in the discussion, but we omit the details of this as 
we have no need of them. 


1.3.4 Operators on Spaces of Valuations 


As we have seen, an ordering on a space 7 of truth values induces an 
ordering on the corresponding spaces I(X, T). Similarly, various connectives 
defined on T induce operators defined on I(X,7), and we close this chapter 
by briefly discussing these next. They will be considered further in Chapter 3. 

In fact, we concentrate on Belnap’s four-valued logic, in which the truth 
set is FOUR and the connectives are determined by the truth table, Ta- 
ble 1.1. Since classical two-valued logic and Kleene’s strong three-valued logic 
are sublogics of FOUR, they are subsumed in our discussion of FOUR and 
therefore need not be considered separately. 

The first of these operators arises through negation, and is the opera- 
tor mapping I(X,T) into itself, and still denoted by ~, in which (~v) (x) = 
~(v(x)) for each x € X, where v is an arbitrary element of I(X,T). 

Likewise, the connectives V and A determine (binary) operators mapping 
I(X,T) x I(X,T) into I(X,T) defined by (u V v)(x) = u(x) V v(x) and 
(uAv)(x) = u(x) ^A v(x), for each x € X, where u and v are arbitrary elements 
of I(X,T). We note that the overloading of the symbols V and A should not 
cause any difficulties. Of course, one can similarly deal with other connectives 
such as > and +. 

If v1, v2 € I(X,T) satisfy the conditions vı EC; v2, vi(a) = f and ve(x) = t 
for some x, then it is clear that ~vı Z ~v. Hence, ~ is not monotonic in this 
case. Thus, ~ is not order continuous in the truth orderings C+. It is, however, 
order continuous in the orderings Ex, as we shall see in Chapter 3, where we 
also consider the continuity of the other operators V and A. 

The following observation is just one of the many interesting properties 
possessed by I(X,7) when we take T to be the logic FOUR, as we are 
currently doing. 


1.3.7 Proposition The operators V and A are monotonic in each argument. 


Proof: Given v € I(X,T), it must be shown that the mappings u > u Vv 
and u => v V u are both monotonic, and, since V is commutative, it suffices 
to show that either is monotonic. It is straightforward to check this from the 
truth table, Table 1.1, and the Hasse diagram for FOUR, Figure 1.1, and we 
omit details. Precisely the same comments apply also to the operator A. E 
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Another interesting fact about FOU R, which emerges from its truth table 
and its Hasse diagram, is the following result. 


1.3.8 Proposition Relative to the truth ordering <+ on FOUR, we have 
ty V t2 = LHti, to} and ty A t2 = Hti, te} for all truth values ty and t2. In 
particular, in classical two-valued logic and Kleene’s strong three-valued logic 
relative to <+, we have tı V tg = max{t;ı, ta} and tı Ato = min{t;, t2} for all 
truth values tı and tə. 


Chapter 2 


The Semantics of Logic Programs 


The objective of this chapter is to introduce the central topic of study in this 
work, namely, logic programs, together with several of the main issues and 
questions which will be addressed in later chapters. In order to ensure that 
our treatment is as self-contained as possible, we will take care to formally 
define all concepts which we consider in detail here and later on. In addition, to 
assist the reader, we give ample references to those topics which we encounter, 
but do not treat in detail. 

For the course of this and subsequent chapters, our main focus will be on 
declarative semantics, and, as already noted in the Introduction, issues con- 
cerning procedural aspects will play only a minor role. In particular, in this 
chapter and later in Chapter 5, we will introduce some of the best known 
declarative semantics for logic programs, and we will develop a uniform treat- 
ment of them applicable not only to resolution-based logic programming, but 
also to non-monotonic reasoning as well. 

Frequently, a declarative semantics is given by assigning intended models to 
logic programs. This is done by selecting from the set of all models for a logic 
program, a subset which contains those models with some properties deemed 
to be desirable depending on one’s objectives and intended applications. All 
the semantics which we will discuss can be described in terms of fixed points 
of operators associated with logic programs, and they are all well-established. 
Our new and novel contribution in this chapter is the development of a uniform 
and operator-free characterization of them. 

Our first task, however, is to introduce formally some of the basic concepts 
and notation which will be needed throughout the sequel. 


2.1 Logic Programs and Their Models 


2.1.1 Definition Given a first-order language £, a clause, program clause, 
or rule in £ is a formula of the form 


1We follow the presentation of semantics from [Hitzler and Wendt, 2002, Hitzler, 2003b, 
Hitzler and Wendt, 2005, Hitzler, 2005, Knorr and Hitzler, 2007]. 
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where J,n € N; A is an atom in £; L1,..., Ln are literals in £; and z1,..., £1 
are all the variable symbols occurring in the formula. We will follow common 
practice and abbreviate such a clause by writing simply 


Ac Li,..., Ln, 


so that the universal quantifiers are understood, and the conjunction symbol 
A is replaced by a comma. The atom A is called the head of the clause, and 
the conjunction L1,..., Ln is called the body of the clause; the literals Li, 
i = 1,...,n, in the body Lı,..., Ln are called body literals. If a body literal 
L is an atom B, say, then we say that B occurs positively in the body of the 
clause. If L is a negated atom —B, then we say that B occurs negatively in 
the body of the clause. By an abuse of notation, we allow n = 0, by which we 
mean that the body can be empty, and in this case the clause A —, or simply 
A, is also called a unit clause or a fact. It will sometimes be convenient to 
further abbreviate a clause by writing 


A < body, 


wherein body denotes the body of the clause. Furthermore, we will use body 
not only to denote a conjunction of literals, but also to denote the correspond- 
ing set containing these literals. This further abuse of notation will substan- 
tially ease matters in some places and will not cause confusion. Note that in 
doing this, we are ignoring the ordering of the literals in clause bodies. This will 
not matter most of the time, since we are not much concerned with procedural 
matters, as already noted, and for this reason we often denote a typical clause 
by A — Ay,...,An,7B1,...,7B,, say, where all the A;,i=1,...,n, and all 
the B;, j =1,...,k, are atoms in £. Notice that we allow ourselves a bit of 
latitude in the subscripts we employ in writing down clauses, and, for example, 
the roles of n in the clause just considered and in the clause A + L1,..., Ln 
above are not identical in general, unless there are no negated atoms present, 
of course. 

A normal logic program is a finite set of clauses. A definite logic program 
is a normal logic program in which no negation symbols occur. The term 
program will subsequently always mean a normal program. Definite programs 
are sometimes called positive programs, and obviously every definite program 
is a normal program. A propositional logic program is a program in which all 
predicate symbols are of arity zero. | 


In most cases, the underlying first-order language, or simply the underlying 
language, Lp of a program P will not be given explicitly, but will be under- 
stood to be the (first-order) language generated by the constant, variable, 
function, and predicate symbols occurring in P. However, when P does not 
contain any constant symbols, we add one to £p, so that the underlying lan- 
guage always contains at least one constant symbol. Propositional programs 
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will be treated slightly differently, and we will return to this point later, see 
the examples following Definition 2.1.6. 

We illustrate Definition 2.1.1 with a number of example programs, to which 
we will return frequently later. At all times, unless stated to the contrary, 
we will adhere to the following notational conventions concerning programs: 
constant, function, and predicate symbols start with a lowercase letter and 
are set in typewriter font unless they consist of a single letter only; variable 
symbols start with an uppercase letter. 


2.1.2 Program (Tweety1) Let Tweetyl? be the program consisting of the 
following clauses. 


penguin(tweety) — 
bird(bob) — 
bird(X) — penguin(X) 
flies(X) — bird(X), spenguin(X) 


Tweety1 is intended to represent the following knowledge: tweety is a penguin, 
bob is a bird, all penguins are birds, and every bird which is not a penguin 
can fly. 


2.1.3 Program (Even) Let Even be the program consisting of the following 
clauses. 


even(a) — 
even(s(X)) — 7even(X) 


The intended meaning of this program is as follows: a is the natural number 
0, and s is the successor function on natural numbers. Thus, the program 
represents the knowledge that 0 is even, and if some number is not even, then 
its successor is even. 


Many of our later examples will be variations of the Even program theme 
and will employ the successor notation for natural numbers. Consider, for 
example, the following program. 


2.1.4 Program (Length) Let Length be the program consisting of the fol- 
lowing clauses. 


length([],a) — 
length([H|T],s(X)) — length(T, X) 


Following Prolog conventions, [] denotes the empty list, and [- | -] denotes 


2We borrow Tweety programs, in which a penguin usually called Tweety appears, from 
the literature discussing the semantics of non-monotonic reasoning. 
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a binary function whose intended meaning is the list constructor whose first 
argument is the head of the list and whose second argument is its tail. Thus, 
the program Length is intended to be a recursive definition of the “length of 
lists” using the successor notation for natural numbers as in Program 2.1.3. 
Length is an example of a definite program. 


Thus far, we have specified the syntax of logic programs. We now turn our 
attention to dealing with their semantics, and this is based on Definitions 1.2.7 
and 1.2.8 of Chapter 1 with some notation peculiar to logic programming. 


2.1.5 Definition Let P be a program with underlying language Lp, and 
let D be a non-empty set. A preinterpretation J for P with domain D is a 
preinterpretation J for £p with domain D. 

Let J be a preinterpretation for the program P, with domain D, and 
let 0 be a J-variable assignment. For a typical clause C in P of the form 
Ae Aj,...,An,7Bi,...,7B,, we let (C0)? denote 


(AO)? — (A,6)%,...,(An9)7,7( B10)", ...,7(B, 0)”. 


We call (C0)? a J-ground instance? of C. By ground,(P), we denote the set 
of all J-ground instances of clauses in P. We denote by Bp z the set Br, 7 of 
all J-ground instances of atoms in £p, that is, the collection of all elements 
of the form p(d1,...,dn), where p is an n-ary predicate symbol in Lp and 
d,,...,dy E€ D. Usually, we will be working over a fixed, but arbitrary, prein- 
terpretation J. In order to ease notation, we will often omit mention of J if it 
causes no confusion, and instead of writing Bp J, ground ;(P), J-ground in- 
stance, etc., we will simply write Bp, ground(P), ground instance, etc. We will 
frequently abuse notation even further by referring to elements of ground ;(P) 
as (ground) clauses and by applying to ground clauses terminology, such as 
“definite”, already defined for program clauses. 


Of particular interest is the so-called Herbrand preinterpretation of a pro- 
gram. Its importance rests on the fact that, for many purposes, restricting 
to Herbrand preinterpretations causes no loss of generality.+ For example, in 
classical first-order logic, a set of clauses has a model if and only if it has a Her- 
brand model. Indeed, in many cases in the literature on the subject, discussions 
of logic programming semantics refer only to Herbrand (pre)interpretations 
and Herbrand models. 


2.1.6 Definition Given a program P with underlying language Cp, the Her- 
brand universe Up of P is the set of all ground terms in Lp. The Herbrand 


3This extends the notion of a J-ground instance of an atom to a J-ground instance of a 
clause, see [Lloyd, 1987, Page 12]. 

4Nevertheless, we prefer to formulate the basic definitions in complete generality. For one 
thing this is at no extra cost, and for another we require quite general preinterpretations in 
our treatment of acceptable programs in Chapter 5. 
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preinterpretation J, say, for P, has domain Up and assigns constant and func- 
tion symbols as follows, where we use the notation of Definition 1.2.7. 


(1) For each constant symbol c € Lp, c’ is equal to c. 


2 For each n-ary function symbol | E Lp, | $ U 2 — U P 1S the mappıng 
P 
defined by J (Cisse, ta) = Stenta): 


We illustrate these definitions by discussing some of the previous examples 
in relation to them. For this purpose, and indeed for all example programs 
unless otherwise noted, we consider the Herbrand preinterpretation. 

For the program Tweetyl (Program 2.1.2), we obtain 


Utweety1 = {bob, tweety}, 

Brweety1 = {penguin(bob), penguin(tweety), 
bird(bob), bird(tweety), 
flies(bob), flies(tweety)}, 


and ground(Tweety1) consists of the following clauses. 


penguin(tweety) — 
bird(bob 
bird(tweety 


bird(bob 


van 
<— penguin(tweety) 
<— penguin(bob) 
flies(tweety 
flies(bob 


<— bird(tweety), penguin(tweety) 


NE A Ne T S. Sa anl 


+— bird(bob), ~penguin(bob) 


For the successor notation used in the program Even (Program 2.1.3), 
the following convention will be convenient: for n € N, we denote the term 
s(s(...s(x)...)), with n occurrences of s, by s” (x). We then obtain for the 
Even program 


UEven = {s"(a) | ne N}, 
BEven Eazi {even (s”(a)) | ne N}, 


and ground(Even) consists of the following clauses. 


even(a) — 


even (s”*"(a)) — seven (s"(a)) for all n € N 


We note that the set ground(Even) is infinite. In fact, ground(Even) can 
be thought of as an infinite propositional program consisting of clauses py — 
and Pn+1 — —Pn, where, for each n € N, pn is a propositional variable re- 
placing even (s"(a)). Often, it is conceptually easier to think of ground(P) as 
a (countably) infinite propositional program and to study it rather than P. 
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Indeed, many authors even define a logic program to be a set of propositional 
clauses, with the advantage that notation can be considerably eased in some 
places. For many of the example programs which we will discuss later, we will 
also take advantage of this simpler notation, as in the following. 


2.1.7 Program Let P be the following program. 


p= nq 
q= p 


Then Bp = {p,q} and ground(P) = P. Preinterpretations play no role in this 
case. 


We now come to the fundamental notions of interpretation and model 
for programs. Interpretations and models as defined next are the particular 
forms of Definitions 1.2.7 and 1.2.8 that we will use henceforth in studying 
the semantics of programs. 


2.1.8 Definition Let P be a program, let J be a preinterpretation for P 
with domain D, and let 7 be a logic. An interpretation or valuation for P 
(based on J) with values in T is an interpretation or valuation defined on 
Bp, j with values in 7. An interpretation J for P is a model for P if I(C) = t 
for each clause C € ground,;(P). As in Definition 1.2.8, we sometimes refer 
to valuations, interpretations, and models for P based on J as J-valuations, 
J-interpretations, and J-models, respectively. 


We will in future use the notation Ip j,2 for the set of all two-valued in- 
terpretations for P based on J. As usual, reference to the preinterpretation 
J will often be omitted if it is fixed and understood. Similarly, the number 
2 will be omitted if it is understood, and hence the set of all two-valued in- 
terpretations for P based on a given, fixed preinterpretation J will often be 
denoted by Ip. or just by Ip. Similar comments apply to the set Ip J3 of 
all three-valued interpretations for P based on J and to the set Ip j4 of all 
four-valued interpretations for P based on J. 

The three sets just defined have the order-theoretic structure described in 
Theorem 1.3.4 relative to the orders we discussed in Chapter 1. In particular, 
Ip. can be identified with the power set of Bp. 

With these structures in place, we are now ready to begin the main subject 
of our study in this chapter, namely, the semantics of logic programs. 


2.2 Supported Models 


As already noted, a declarative semantics for logic programs is usually 
given by selecting models for the programs which satisfy certain desirable 
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conditions. This selection is often most conveniently described by an operator, 
mapping interpretations to interpretations, whose fixed points are exactly the 
models being sought. In this section, we will introduce the first of a number of 
operators we study in the context of declarative semantics, namely, the single- 
step or immediate consequence operator due to Kowalski and van Emden, 
see [van Emden and Kowalski, 1976]. The single-step operator was historically 
the first to be studied in relation to logic programming semantics and in 
many ways is the most natural. Indeed, it turns out that for definite programs 
the single-step operator is order continuous and that its least fixed point, as 
given by Kleene’s theorem, Theorem 1.1.9, accords well with a programmer’s 
expectations of what a declarative semantics should be and how it should 
relate to the procedural semantics.” 

For the remainder of this section and for the next, we will work in classical 
two-valued logic. Hence, Ip or Ip z means Ip J2 here and in the subsequent 
section, where J is a given preinterpretation, and we will on occasions remind 
the reader of this notational convenience. 

The following is an important definition. 


2.2.1 Definition Let P be a normal logic program, and let J be a preinter- 
pretation for Lp. The single-step operator or immediate consequence operator 
Tp, J : Ipy = Ip,J is defined, for I € IPJ, by setting Tp J(I) to be the set 
of all A € Bp j for which there is a clause A — L1,..., Ln in ground;(P) 
satisfying JE Lı A... A Ln, that is, satisfying I(Lı A... A Ln) =t. 


Consistent with our earlier remarks concerning notation, we will usually 
denote Tp y simply by Tp when J is understood. Furthermore, we will some- 
times find it convenient to refer to Tp as the Tp-operator. 

The importance of the immediate consequence operator is clear from the 
following proposition. 


2.2.2 Proposition The models for P are exactly the pre-fixed points of Tp. 


Proof: Let I € Ip be a model for P, and let A € Tp(I). Then there is a 
clause A — L1,..., Ln in ground(P) with I(Li A... A Ln) = t; let us denote 
this clause by C. Since I is a model for P, we have I(C) = t. Hence, I(A) = t, 
and so A € I, giving Tp(I) C I, as required. 

Conversely, suppose Tp(I) C I, and let A — L1,..., Ln be a clause C in 
ground(P) with I(Li A... A^ Ln) =t. Then A € Tp(I) C I. Hence, I(A) =t, 
and in consequence I(C') = t, as required. a 


The notion of model is far too general to capture the declarative semantics 
of logic programs without some restrictions being imposed upon it. Indeed, Bp 
itself is always a model for P, but in general Bp fails by far to give a reason- 
able “intended meaning” for a program. Standard approaches to declarative 


5 As already noted, we will not consider procedural aspects in depth and instead refer 
the reader to [Apt, 1997, Lloyd, 1987] for details of procedural semantics. 
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semantics therefore involve the imposition of certain additional conditions 
which models must satisfy in order to qualify as intended models. However, 
just what conditions it is reasonable to choose in this context depends on one’s 
particular understanding of what “intended” could mean, and the remainder 
of this chapter will be devoted, in the main, to the presentation and study of 
different conditions which have been proposed in the literature to solve this 
problem. 

The observation that Bp is too large, as a two-valued model, suggests the 
selection of minimal models. Of particular interest are the cases when there 
exists a least model. 


2.2.3 Theorem Let P be a definite program, and let J denote a fixed prein- 
terpretation for £p. Then the following statements hold. 


(a) Tp is order continuous on Ip. 


(b) P has a least (J-)model, which coincides with the least fixed point of Tp 
and is equal to Tp Tw. 


(c) The intersection of any non-empty collection of (J-)models for P is itself a 
model for P. Therefore, a definite program cannot have two distinct min- 
imal models. Furthermore, the intersection of the collection of all models 
for P coincides with the least model for P. 


Proof: (a) We first show that Tp is monotonic. Let I,K € Ip with I C K, 
and suppose A € Tp(I). Then there is a clause A +— body in ground(P) with 
body C I. Hence, body C K, and so A € Tp(K), as required. 

Now let Z = {I | A € A} be a directed family of two-valued inter- 
pretations, and let J = | |Z = UZ. Since the order under consideration is 
set-inclusion and Tp is monotonic, we immediately have that Tp(Z) is di- 
rected. By the remarks following Definition 1.1.7, it remains to show that 
Tp(I) C UTp(Z). So suppose that A belongs to Tp(I). Then there is a 
(definite) clause C of the form A + Aj,...,An in ground(P) satisfying 
Aj,...,An E I. Therefore, there exist J),,...,J), in Z with A; € I), for 
i = 1,...,n. Since Z is directed, there is I, € Z with I), C Jy for i =1,...,n. 
Hence, the body of C is true in I), and we obtain that A € Tp(I\) and, 
consequently, that A € U Tp(Z), as required. 

(b) By (a), we can apply Kleene’s theorem, Theorem 1.1.9, to see that 
Tp has a least pre-fixed point, that this least pre-fixed point is in fact the 
least fixed point of Tp, and that it coincides with Tp tw. Hence, by Proposi- 
tion 2.2.2, Tp tw is the least model for P. 

(c) The details of the proof of this claim are straightforward and therefore 
are omitted. a 


It can be shown, furthermore, that the least model for definite programs 
corresponds rather well with the procedural behaviour of logic programming 
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systems based on resolution.® Thus, in summary, the least model semantics is 
very satisfactory for definite logic programs from all points of view. 

Attempts to generalize Theorem 2.2.3 to normal programs, however, fail 
in several ways, as we show next. 


2.2.4 Program Let P be the normal logic program consisting of the following 
clauses. 


p= nq 
q= p 


roar 


Then {p,r} and {q,r} are minimal, but incomparable, models so that P has 
no least model, Tp has no fixed points at all (and hence P has no supported 
models, see Proposition 2.2.6), and, since Tp(@) = {p,q,r} and Tp({p, q,r}) = 
Ø, we see that Tp is not monotonic. 


It is not entirely clear how to cope with the negative results presented by 
Program 2.2.4. Various different methods have been discussed in the literature, 
leading to different declarative semantics with varying degrees of success. We 
will discuss the more prominent of these approaches in the remainder of this 
chapter. 

A rather straightforward attack is to study minimal models instead of least 
models. However, consider the program Even (Program 2.1.3) with models 


Kı = {even (s*"(a)) |n € N} and 
Ko = {even (s*"t!(a)) |n E€ N}. 


Both models are minimal, but it seems to be rather obvious that Kı captures 
the intended meaning of Even, while Kə does not. Essentially, this arises from 
the fact that even(s(a)) is true with respect to K2, although the program 
itself gives no justification for this. Thus, it would seem intuitively reasonable 
that whenever an atom is true in an intended model for a program P, then 
it should be true for a reason provided by the program itself. This idea is 
captured by the following definition, see [Apt et al., 1988]. 


2.2.5 Definition An interpretation J for a program P is called supported if 
for each A € I there is a clause A — body in ground(P) with I(body) = t. 


Continuing the Even program discussion above, note that Kı is supported, 
whereas Kə is not. Indeed, Kı is the only supported model for Even, as we 
will see later. So, for some programs, supportedness is an appropriate require- 
ment of models. Supportedness is also captured by the immediate consequence 
operator, as follows. 


6A detailed account of resolution-based logic programming can be found in [Apt, 1997, 
Lloyd, 1987]. 
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2.2.6 Proposition The supported interpretations for a program P are ex- 
actly the post-fixed points of Tp. The supported models for P are exactly the 
fixed points of Tp. 


Proof: Let J be a supported interpretation for P, and suppose that A € T. 
Then there is a clause A — body in ground(P) with I(body) = t. But then 
A € Tp(I), showing that I C Tp(J), as required to see that I is a post-fixed 
point of Tp. 

Conversely, assume that I C Tp(I) is a post-fixed point of Tp, and let A € 
I. Then A € Tp(I). Therefore, there exists a clause A — body in ground(P) 
with I(body) = t, showing that I is a supported model for P. 

Finally, using Proposition 2.2.2, we obtain that an interpretation for P 
is a supported model for P if and only if it is both a pre-fixed point and a 
post-fixed point of Tp, that is, if and only if it is a fixed point of Tp. a 


2.2.7 Example Tweetyl from Program 2.1.2 has supported model M, where 
M = {penguin(tweety), bird(bob), bird(tweety),flies(bob)}, as is easily 
verified. Careful inspection will also convince the reader that M is the unique 
supported model for Tweetyl, and we give a formal proof of this in Exam- 
ple 5.1.7. 


From a procedural point of view in the context of resolution-based logic 
programming, supported models are better than minimal ones. They capture 
the probable intention of a programmer who may think of a clause as a form 
of equivalence’ rather than as an implication. 

Since the least model for a definite program is a fixed point, by Theo- 
rem 2.2.3, we obtain as a corollary that the least model is always supported. 
In proving Theorem 2.2.3, we applied Kleene’s theorem. For normal programs, 
this theorem is not applicable, nor is the Knaster-Tarski theorem, due to the 
non-monotonicity of the immediate consequence operator in general. In order 
to study the supported model semantics, that is, in order to obtain fixed points 
of non-monotonic immediate consequence operators, it seems natural to em- 
ploy fixed-point theorems for mappings which are not necessarily monotonic. 
This is the main theme of Chapter 4. 


2.3 Stable Models 


One of the drawbacks of the supported model semantics is that definite 
programs may have more than one supported model. 


7One formal approach to understanding clauses as equivalences is via the notion of the 
Clark completion of a program and is related to SLDNF-resolution, see [Clark, 1978]. 
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2.3.1 Program Let P be the program consisting of the single clause p — p. 
Then both Ø and {p} are supported models for P. 


This unsatisfactory situation is resolved by the introduction of stable mod- 
els. Before we give the definition, let us make the following observation. 


2.3.2 Proposition The least model Tp Î w for a definite program P is the 
unique model M for P satisfying the following condition: there exists a map- 
ping |: Bp — a, for some ordinal a, such that for each A € M there is a 
clause A — body in ground(P) with M(body) = t and I(B) < I(A) for each 
B € body. 


Proof: To start with, take M to be the least model Tp 7 w, choose a = w, 
and define 1: Bp — a by setting L(A) = min{n | A € Tp{t(n+1)}, if AEM, 
and by setting 1(A) = 0, if A g M. Since 9 C Tp T1C...C Tptnc...€ 
Tetw=Umnew TP tm, for each n, we see that l is well-defined and that the 
least model Tp Tw for P has the desired properties. 

Conversely, if M is a model for P which satisfies the given condition for 
some mapping l : Bp — a, then it is easy to show, by induction on (A), that 
A € M implies A € Tp J (I(A) +1). This yields that M C Tp {tw and hence 
that M = Tp Îw by minimality of the model Tp Tw. a 


Mappings l from Bp into an ordinal are commonly called level mappings. 
They will play an important role in several places in the book. On occasions, 
we will need to extend such mapping to literals, and unless stated to the 
contrary, we will always assume that the extension satisfies 1(—A) = L(A) for 
all atoms A. 

The following definition of stable model merges the property of Mp just 
established with that of supportedness.® 


2.3.3 Definition An interpretation I for a program P is called a well- 
supported interpretation if there exists a level mapping |: Bp — a, for some 
ordinal a, with the property that, for each A € J, there is a clause C in 
ground(P) of the form A — Aj,...,An,7Bi,...,7B, such that the body of 
C is true in I and 1(A;) < L(A) fori =1,...,n. A well-supported model for P 
is called a stable model for P. 


2.3.4 Theorem The following statements hold. 
(a) Every stable model is supported, but not vice-versa. 
(b) Every stable model is a minimal model, but not vice-versa. 


(c) Every definite program has a unique stable model, which is its least model. 


8It is shown in [Fages, 1994] that stable models can be introduced as in Definition 2.3.3. 
The original formulation used the Gelfond—Lifschitz operator from Definition 2.3.6. 
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Proof: (a) Supportedness of stable models follows immediately from the def- 
inition. The supported model {p} for Program 2.3.1 is not stable. 

(b) Let P be a program, let M be a stable model for P, and let l be 
a level mapping with respect to which M is well-supported. Assume that 
K is a model for P with K C M. Then there exists A € M \ K, and 
we can assume without loss of generality that A is also such that (A) is 
minimal. By the well-supportedness of M, there is a clause C of the form 
A + Aj,...,An,7Bi,...,7B, in ground(P) such that for i = 1,...,n and 
j =1,...,k we have A; € M, I(A) > l(A;) and Bj ¢ M. Since K C M, we 
obtain, for j = 1,...k, that B; ¢ K, and by minimality of /(A) we obtain 
A; E€ K fori =1,...,n. Since K is a model for P and the body of C is true 
with respect to K, we conclude that A € K, which contradicts the assumption 
that A € M \ K. Hence, M must be a minimal model. 

In the opposite direction, Program 2.3.5 below has {p} as its only model, 
and hence, this is a minimal model. It is clearly not a stable model, however. 

(c) By Proposition 2.3.2, we see that the least model is indeed stable. 
Uniqueness follows from (b) and Theorem 2.2.3 (c). a 


There are programs with unique supported models which are not stable. 
2.3.5 Program The program P consisting of the two clauses 


p&p 
PSD. 


has unique supported model {p}, and this model is not stable. 


A unique stable model is always a least model by Theorem 2.3.4 (b). If a 
program has a least model, however, this model is not guaranteed to be stable, 
as Program 2.3.5 shows in having {p} as its only model. 

A characterization of stable models as fixed points of an operator can be 
given, and we proceed with this next. 


2.3.6 Definition Let P be a normal logic program, and let J € Ip. The 
Gelfond-Lifschitz transform P/I of P is the set of all clauses A — Aj,..., An 
for which there exists a clause A — Aj,...,An,7B1,...,7By in ground(P) 
with B,,...,B, Z I. 

We note that the Gelfond—Lifschitz transform P/I of a program P is always 
definite (as a set of ground clauses) and therefore has a least model Tpjp tw 
by Theorem 2.2.3. The operator GLp : I +> Tp/; Î w is called the Gelfond- 
Lifschitz operator® associated with P. 


2.3.7 Theorem The following hold. 


°The Gelfond—Lifschitz operator is named after the authors of the well-known paper 
[Gelfond and Lifschitz, 1988] and was introduced by them in defining the stable model se- 
mantics. 
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(a) The Gelfond—Lifschitz operator is antitonic and, in general, is not mono- 
tonic. 


(b) An interpretation I is a stable model for a program P if and only if it is 
a fixed point of GLp, that is, if and only if it satisfies GLp(J) = I. 


Proof: (a) Let P be a program, and let I, K be interpretations for P with 
IC K. Then P/K C P/I, and it is a straightforward proof by induction to 
show that Tp; În C Tp; În for all n € N. Hence, GLp(K) = Tpx Tw C 
Tp; Tw = GLp(I), which shows that GLp is antitonic. To see that it is not 
generally monotonic, take P to be Program 2.3.5. On setting I = 0, we obtain 
that P/I is the definite program consisting of the clauses p — p and p <—, and 
GLp(I) = {p}; on setting I = {p}, we obtain that P/I consists of the single 
clause p — p, and GLp(I) = Ø. This establishes (a). 

For (b), we start by supposing that GLp(I) = Tp; Tw = I. Then I is 
the least model for P/I, and hence, is also a model for P, and, by Propo- 
sition 2.3.2, is well-supported with respect to any level mapping l satisfying 
(A) = min{n | A € Tpjr T (n + 1)} for each A € I. Conversely, let I be 
a stable model for P. Then I is well-supported relative to some level map- 
ping l, say. Thus, for every A € I, there is a clause C in ground(P) of the 
form A — Aj,...,An,7B1,...,7B such that the body of C is true in J and 
I(A;) < W(A) for i = 1,...,n. But then, for every A € J, there is a clause 
A — Aj,...,An in P/I whose body is true in J and such that [(A;) < I(A) 
for i =1,...,n. By Proposition 2.3.2, this means that I is the least model for 
PJI, that is, I = Tp; tw = GLp(1). E 


The Gelfond-Lifschitz transform can be considered as a two-step process: 
first, delete each ground clause which has a negative literal ~B in its body 
with B € I; second, delete all negative literals in the bodies of the remaining 
clauses. Indeed, the intuition behind it is as follows. We can think of P as a 
set of premises and of J as a set of beliefs that a rational agent might hold and 
wants to test, given the premises P. Any ground clause that contains —B in its 
body, where B € I, is useless to the agent and can be discarded. Among the 
remaining ground clauses, an occurrence of ~B with B ¢ I is trivial. Thus, 
we can simplify the premises to P/I. If I happens to be the set of atoms that 
logically follow from P/I, then the agent is rational. 


We will now give some examples. 


2.3.8 Example Consider again Tweetyl from Program 2.1.2 and its sup- 
ported model M as given in Example 2.2.7. We show that M is stable. The 
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program Tweety1/M is as follows. 
penguin(tweety 


bird(bob 


= 
hee 
bird(tweety) — penguin(tweety) 
bird(bob)  penguin(bob) 

) 


flies(bob) — bird(bob) 


The least model for this program turns out to be M, which shows that M is 
stable. 


A strange feature of the supported model semantics is that the addition 
of clauses of the form p — p may change the semantics. 


2.3.9 Program (Tweety2) Consider the following program Tweety2. 


penguin(tweety) — 

bird(bob) — 
bird(X) — penguin(X) 
flies(X) — bird(X), spenguin(X) 
penguin(bob) — penguin(bob) 
Tweety2 results from Tweetyl by adding the clause penguin(bob) <— 
penguin(bob). Intuitively, this addition should not change the semantics of 
the program. However, in addition to the supported model M from Example 
2.2.7, Tweety2 also has 

= {penguin(tweety), penguin(bob), bird(tweety), bird(bob)} 
as a supported model. While M is also a stable model for Tweety2, M” is not. 
This can be seen by inspecting the program Tweety2/M’, as follows, which 
has {penguin(tweety), bird(bob), bird(tweety)} Æ M’ as its least model. 
penguin(tweety 


bird(bob 


ya 
J= 
bird(tweety) — penguin(tweety) 
bird(bob)  penguin(bob) 

) 


penguin(bob) — penguin(bob) 


We can also use the stable model semantics for modelling choice. 
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2.3.10 Program (Tweety3) Consider the program Tweety3, as follows. 


eagle(tweety)  ~penguin(tweety) 
penguin(tweety) — reagle(tweety) 
bird(X) — eagle(X) 
bird(X) — penguin(X) 
flies(X) — bird(X), spenguin(X) 


This program has the two stable models 
{eagle(tweety), bird(tweety), flies(tweety) } 


and 
{penguin(tweety), bird(tweety)}. 


2.4 Fitting Models 


The stable model semantics is more satisfactory than the supported model 
semantics in that each definite program has a unique stable model which 
coincides with its least model. However, for normal logic programs in general, 
uniqueness cannot be guaranteed, as can be seen from Program 2.1.7, which 
has two stable models {p} and {q}. It is desirable to be able to associate with 
each program a unique model in some natural way. One way of doing this is 
by means of three-valued logic, and we discuss this next.!° 

In fact, we will work with Kleene’s strong three-valued logic as discussed 
in Chapter 1 and, in particular, with the knowledge ordering, <ķ, on the truth 
values. We find it convenient here to represent three-valued interpretations as 
signed sets, see Section 1.3.3, so that the corresponding ordering LE, is subset 
inclusion of signed sets. 

Given a normal logic program P, we define the following operators Tp and 
Fp on Ip = Ip3 = Ip j3. First, Tp(I) is the set of all A € Bp for which 
there is a clause A — body in ground(P) with body true in I with respect 
to Kleene’s strong three-valued logic. Second, Fp(T) is the set of all A € Bp 
such that for all clauses A — body in ground(P) we have that body is false in 
I with respect to Kleene’s strong three-valued logic. Finally, we define 


®p(I) = Tp(I) U>F (J) 


for all I € Ip. We will call the operator ®p the Fitting operator for P or the 
® p-operator. 


10The resulting Kripke-Kleene semantics, herein called the Fitting semantics, is due to 
Fitting [Fitting, 1985]. 
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Notice that, for any three-valued interpretation I, we have A € ®p(J) 
whenever A is the head of a ground clause and ~A € ®p(J) whenever there 
is no ground clause whose head is A. 


2.4.1 Example We illustrate the calculation of ®p(I), taking P to be the 
program Tweetyl and starting with the three-valued interpretation I = Q 
thought of as a signed subset; of course, @ gives truth value u to all ground 
atoms in our present context. 


We have 
Tp(0) = {penguin(tweety), bird(bob) } 
and 
=F p(0) = ~{penguin(bob)}. 
Therefore, 


®p(0) = {penguin(tweety), bird(bob), apenguin(bob)}. 
Continuing, we have 
Tp(®p(0)) = {penguin(tweety), bird(bob), bird(tweety), flies(bob) }, 


and 
aF p(® p(0)) = a{penguin(bob), flies(tweety)}. 


Thus, ®p(®p(0)) = TE (®p(0)) U>F p(®p(0)) is a total three-valued interpre- 
tation. It follows from this fact and Proposition 2.4.4 below that ®p(®p(0)) 
is, in fact, the least fixed point of ®p, as can readily be checked in any case 
by iterating ®p once more. 


The development of the operator ®p somewhat parallels that of Tp except 
that there are two orderings involved, and the following result is analogous to 
Proposition 2.2.2. 


2.4.2 Proposition Let P be a normal logic program. Then the three-valued 
models for P are exactly the pre-fixed points of ®p in the truth ordering C+. 


Proof: Suppose that M is a three-valued interpretation for P satisfying 
@®p(M) E+ M, and let A € Bp be arbitrary. Suppose that @p(M)(A) = u. 
Then we must have M(A) equal to u or to t. Since no clause A — body 
in ground(P) can have M(body) = t, otherwise ®p(M)(A) would be equal 
to t, we must have M(body) equal to u or to f for each clause A — body 
in ground(P). But then, on recalling the truth value given to — in Defini- 
tion 1.3.3, we see that A — body is true in M. The other possible values for 
® p(M)(A) are handled similarly, and so M is a model for P. 

The converse is also handled similarly, and we omit the details. a 
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2.4.3 Program Consider the following program P. 


p aE 
PE 
qq 


rer 


Define M as follows: M(p) = f, M(q) = u, and M(r) = t. Then M is a 
three-valued interpretation for P satisfying ®p(M) Ex M, and yet M is not 
a model for P. 

On the other hand, take P to be Program 2.2.4. Define M as follows: 
M(p) =t, M(q) = u, and M(r) = t. Then M is a three-valued model for P, 
but it does not satisfy the inequality ®p(M) Eg M. 

Therefore, neither implication of Proposition 2.4.2 holds in the case of the 
knowledge ordering. 


The following fact about ®p is fundamental. 


2.4.4 Proposition Let P be a program. Then ®p is monotonic on JIp3 in 
the knowledge ordering Ex. 


Proof: Let I, K € Ipa with I C K. We show ®p(I) C ®p(K). Let A € ®p(I) 
be an atom. Then A € T(J). Therefore, there is a ground clause A — body 
such that body is true in J. From Table 1.1, each literal in body must be true 
and therefore, noting the results of Section 1.3.3, must belong to J. Hence, 
each literal in body belongs to K since J C K and is therefore true in K. 
Hence, body is also true in K, and we obtain that A € Tp(K) C ®p(K). Now 
let ~A € ®p(I) be a negated atom. Then A € Fp(I), and so, for all ground 
clauses A — body, we have that body is false in J. So, given such a clause, 
from Table 1.1 we see that at least one literal L;, say, in body, is false. Hence, 
by the results of Section 1.3.3 again, we have =L; € J. But J C K and hence 
=L; € K. Therefore, L; is also false in K, and consequently body is false in 
K. Thus, we obtain A € Fp(K), and hence ~A € ®p(K), as required. a 


2.4.5 Example Take P to be Program 2.2.4 again. Define three-valued in- 
terpretations I and K for P as follows: I(p) I(q) I(r) f, and 
K(p) = K(q) = K(r) = t. Then I C; K. Yet ®p(K) is constant with value f, 
and ®p(J) is constant with value t. Hence, #(K) C+ ®(I), and so ®p is not 
monotonic relative to the truth ordering. 


Since the operator ®p is monotonic relative to the ordering Ex, it has 
a least fixed point by the Knaster-Tarski theorem, Theorem 1.1.10, and this 
least fixed point is an ordinal power ®p 7 a, as defined in Section 1.1, for 
some ordinal a. The least fixed point of ®p is called the Kripke-Kleene model 
or Fitting model for P. It turns out, as we show later, that ®p is not order 
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continuous, indeed not even w-continuous, relative to E;,, and so Kleene’s 
theorem, Theorem 1.1.9, is not generally applicable to ®p. 


2.4.6 Proposition Let P be a program. Then every fixed point M of ®p isa 
model for P with the following properties. (a) If A € Bp is such that M(A) = 
t, then there exists a clause A — body in ground(P) with M(body) = t. 
(b) If A € Bp is such that for all clauses A — body in ground(P) we have 
M (body) = f, then M(A) =f. 


Proof: Let A — body be a clause in ground(P). If M (body) = t, then M(A) = 
®p(M)(A) = M (body) = t. If M(A) = f, then ®p(M)(A) = M(A) = f, and 
hence M(body) = f. Finally, if M(A) = u, then ®p(M)(A) = M(A) = u, 
and therefore M(body) =f or M(body) = u. By definition of the truth value 
given to —, we see that this suffices to show that M is a model for P. 

In order to show (a), let A € Bp, and suppose that M(A) = t. Then 
®p(M)(A) = M(A) = t, and there is a clause A — body in ground(P) with 
M (body) = t by definition of ®p. 

To show (b), let A € Bp, and assume that for all clauses A — body in 
ground(P) we have M(body) = f. Then M(A) = ®p(M)(A) = f, again by 
definition of ®p. | | 


Proposition 2.4.6 shows that fixed points of ®p are three-valued supported 
models for P, meaning that they satisfy (a) and (b) of Proposition 2.4.6. Note 
that a total three-valued supported model is a supported model in the sense 
of Definition 2.2.5. 


2.4.7 Proposition Let P be a program. Then the fixed points of ®p are 
exactly the three-valued supported models for P. 


Proof: Certainly, every fixed point of ®p is a three-valued supported model 
for P by Proposition 2.4.6. Conversely, let M be a three-valued supported 
model for P, and let A € Bp. If M(A) = t, then, by definition of a three-valued 
supported model, there exists a clause A + body in ground(P) such that 
M (body) = t, and hence ®p(M)(A) = M(body) = t = M(A). If M(A) =f, 
then for all clauses A — body in ground(P) we have that M (body) = f, since 
M is a model for P. Hence, ®p(M)(A) = M(body) = f = M(A). It follows 
that M is a fixed point of ®p, as required. | 


Before discussing further properties of the Fitting model, we give an alter- 
native characterization of it. 

For a program P and a three-valued interpretation J € Ip3, an I-partial 
level mapping for P is a partial mapping l : Bp — a with domain dom(l) = 
{A| AEI or aA € J}, where a is some ordinal. Again, we extend every such 
mapping to literals by setting (=A) = 1(A) for all A € dom(l). 
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2.4.8 Definition Let P be a normal logic program, let I be a three-valued 
model for P, and let l be an I-partial level mapping for P. We say that P 
satisfies (F) with respect to I and l if each A € dom(l) satisfies one of the 
following conditions. 


(Fi) A € I, and there is a clause A — L1,..., Ln in ground(P) such that 
L; € I and I(A) > (Li) for i=1,...,n. 


(Fi) ~A € J, and for each clause A — L1,..., Ln in ground(P) there exists 
i€ {1,...,n} with =L; € I and I(A) > I(L;). 


If A € dom(l) satisfies (Fi), then we say that A satisfies (Fi) with respect to I 
and l, with similar terminology if A € dom(l) satisfies (Fii). 


2.4.9 Theorem Let P be a normal logic program with Fitting model Mp. 
Then, in the knowledge ordering Eg, Mp is the greatest model among all 
three-valued models I for which there exists an I-partial level mapping / for 
P such that P satisfies (F) with respect to I and l. 


Proof: We have Mp = ®p Î a for some ordinal a, and indeed œ may be 
taken to be the closure ordinal for Mp. Define the Mp-partial level mapping 
lp : Bp > a as follows: lp(A) = 8, where @ is the least ordinal such that 
A is not undefined in ®p ft (8 + 1). The proof will be established by showing 
the following facts. (1) P satisfies (F) with respect to Mp and lp. (2) If I is 
a three-valued model for P and / is an J-partial level mapping such that P 
satisfies (F) with respect to J and l, then I C Mp. 

(1) Let A € dom(/p), and suppose that [p(A) = 8. We consider the two 
cases corresponding to (Fi) and (Fii). 

Case (Fi). If A € Mp, then A € Tp(®p Ì8). Hence, there exists a clause 
A <— body in ground(P) such that body is true in ®p{ 8. Therefore, for all 
Li € body, we have that L; € ®p TG, and hence Ip(L;) < 8, and also that 
Li € Mp for all i. Consequently, A satisfies (Fi) with respect to Mp and lp. 

Case (Fii). If aA € Mp, then A € Fp(®p T 8). Hence, for each clause 
A <— body in ground(P), there is a literal L € body with ~L € ®p Ù 8. But 
then lp(L) < 6 and ~L € Mp. Consequently, A satisfies (Fii) with respect to 
Mp and lp, and we have established that fact (1) holds. 

(2) We show via transfinite induction on 8 = L(A) that, whenever A € J, 
or =A € I, we have A € ®pt(G+1), or ~A € Opt (8+ 1)), respectively. For 
the base case, note that if /(A) = 0, then A € J implies that A occurs as the 
head of a fact in ground(P), hence A € ®pT1, and =A € I implies that there 
is no clause with head A in ground(P), hence ~A € ®p Ì1. So assume now 
that the induction hypothesis holds for all B € Bp with I(B) < 8 and that 
(A) = 3. We consider two cases. 

Case i. If A € I, then it satisfies (Fi) with respect to I and l. Hence, there 
is a clause A — body in ground(P) such that body C I and I(K) < £ for all 
K € body. Hence, body C Mp by the induction hypothesis, and since Mp is 
a model for P, we obtain A € Mp. 


42 Mathematical Aspects of Logic Programming Semantics 


Case ii. If ~A € I, then A satisfies (Fii) with respect to I and l. Hence, 
for each clause A — body in ground(P), there is K € body with ~K € I and 
I(K) < B. But then, by the induction hypothesis, we have ~K € Mp, and 
consequently for each clause A — body in ground(P) we obtain that body is 
false in Mp. Since Mp = ®p(Mp) is a fixed point of the ®p-operator, we 
obtain ~A € Mp. This establishes Fact (2) and concludes the proof. a 


The following corollary follows immediately as a special case of the previous 
result. 


2.4.10 Corollary A normal logic program P has a total Fitting model if and 
only if there is a total model J for P and a (total) level mapping l for P such 
that P satisfies (F) with respect to I and l. 


2.4.11 Example Example 2.4.1 shows that Tweetyl (Program 2.1.2) has 
total Fitting model M U 7(Brweety1 \ M), where M is as in Example 2.2.7. 
Tweety2 (Program 2.3.9) has Fitting model 


{penguin(tweety), bird(bob), bird(tweety), -flies(tweety)}. 


Thus, we cannot decide whether or not bob is a penguin. Hence, the Fitting 
semantics suffers from the same deficiency as the supported model semantics, 
see our discussion of Program 2.3.9. 

Tweety3 (Program 2.3.10) has @ as its Fitting model. 


The Fitting operator is not w-continuous in general, not even for definite 
programs, as shown by the next example. 


2.4.12 Program Consider the program P consisting of the following clauses. 


p(s(X)) — p(X) 
q — p(X) 


Then ®p În = {>p (s*(0)) | k <n} for all n € N and p fw = {-p(s"(0)) | 
n € N}. However, ®p Ù (w + 1) = {~q, ~p(s” (0)) | n € N} is the least fixed 
point of the operator. 


The Fitting operator can be thought of as an approximation to the imme- 
diate consequence operator, in the sense of the following proposition. 


2.4.13 Proposition Let P be a program. Then for all J € Ip 3, we have that 
®p(I)* C Tp (I*) C Bp \ ®p(1)~. Furthermore, the Fitting operator maps 
total interpretations to total interpretations and coincides with the immediate 
consequence operator on these. 
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Proof: Let I = I* U-I~ be a three-valued interpretation, and let A € 
®p(I)*. Then there is a clause A — body in ground(P), where body equals 
Aj,.--,An, 7Bi,..., Bk, say, and is true in the three-valued interpretation 
I. Therefore, for all i and j, we have A; € I* and B; € I~ so that A; € It 
and B; ¢ It. Therefore, body is true in the two-valued interpretation J*, and 
so A € Tp(I*). Conversely, if I is total, then Bj ¢ I+ means that B; € I, 
and hence whenever A € Tp(It) we have A € ®p(JI)t. This deals with the 
first inclusion. 

For the second inclusion, A € ®p(J)~ if and only if for all clauses A — 
body in ground(P) we have body false in the three-valued interpretation T. 
But then one of the literals in body is false, and so, using the notation already 
established for body, either some A; € I~ or some B; € I*, that is, either 
some A; ¢ It or some B; € I*. Therefore, body is also false in the two- 
valued interpretation I+ leading to A ¢ Tp(It). We thus obtain ®p(I)~ C 
Bp\Tp(I*) so that Tp (I+) (om Bp\®p(I)-. If I is total, then Bp\®p(I)~ = 
®p(I)t = Tp(L) = Tp(I*). E 


From Proposition 2.4.13, we immediately obtain that total Fitting models 
are always supported. They are, in fact, also stable in general, as we will see 
later in Section 2.6. However, if a program has a unique stable model, it does 
not necessarily have a total Fitting model. 


2.4.14 Program The program consisting of the three clauses 


p= nq 
q= p 
Pp = p 


has unique (two-valued) supported model {p}, which is also stable. However, 
its (three-valued) Fitting model is everywhere equal to u. 


2.5 Perfect Models 


The approach using three-valued models, which was presented in Section 
2.4, has the advantage that a unique model, namely, the least fixed point of 
the Fitting operator, or the Fitting model, can be associated with each given 
program. This avoids the ambiguity present in semantics based on classical 
logic, such as the stable model semantics, where a program may have many 
associated models. 

An alternative way of avoiding this problem is to restrict syntax of pro- 
grams in such a way that only programs are allowed whose semantics is unam- 
biguous. The restriction is usually put in place by conditions which prevent 
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recursion in certain situations, and the most convenient way of expressing 
these conditions is again by the use of level mappings. For example, the alter- 
native characterization of the Fitting model in Definition 2.4.8 and Theorem 
2.4.9 can be viewed from this standpoint, and we will return to this point later 
on in this section and in Section 2.6. 

The approach which we present in this section is based on the following 
idea: the introduction of negation, and in particular the possibility of allow- 
ing recursive dependencies between negated atoms, causes ambiguity from a 
declarative point of view. However, if recursion is only allowed through posi- 
tive atoms, a standard model, namely, the least model, can be obtained. So it 
seems natural to disallow recursion through negative dependencies, while at 
the same time allowing recursion through positive ones. This idea is captured 
in the following definition. 


2.5.1 Definition A program P is called locally stratified'' if there exists a 
level mapping |: Bp — a such that for each clause 


A Aj,...,An,7B1,...,7Bm 
in ground(P) the following hold. 
(S1) (A) > U(A;) for 7 =1,...,n. 
(S2) 1(A) > 1(B;) for j =1,...,m. 


Furthermore, P is called stratified if it is locally stratified, and for all atoms 
A, B € Bp with the same predicate symbol, we have 1(A) = I(B). 


Note that for stratified programs the image of the level mapping involved 
is finite, in contrast to locally stratified programs. Stratified programs are 
particularly interesting from the procedural point of view. Nevertheless, we 
will concentrate here on the more general locally stratified programs. 

Along with the introduction of locally stratified programs, a semantics was 
developed called the perfect model semantics. We will discuss this semantics 
only in passing in this chapter. Indeed, we will focus here on the more general 
weakly perfect semantics, which is introduced later in this section and is also 
defined for locally stratified programs. However, we will consider the perfect 
model semantics in some detail in Section 6.3. 


2.5.2 Definition Let P be a locally stratified program, and let | denote the 
associated level mapping. Given two distinct models M and N for P, we say 
that N is preferable to M if, for every ground atom A in N \ M, there is a 
ground atom B in M \ N such that I(A) > I(B). A model M for P is called 
perfect if there are no models for P preferable to M. 


11The notion of local stratification and the perfect model semantics were introduced in 
the paper [Przymusinski, 1988]. Stratified programs and certain procedural apects of them 
were studied in [Apt et al., 1988]. 
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2.5.3 Example Tweety2 (Program 2.3.9) is locally stratified, indeed strat- 
ified, since flies depends both on penguin and on bird, where “depends 
on” is defined below, bird depends only on penguin, and penguin does not 
depend on any predicate symbol other than itself. We will see in Example 
6.3.12 that it has M from Example 2.2.7 as its perfect model. 

Tweety3 (Program 2.3.10) is obviously not locally stratified. 


We will see later in Section 6.3 that every locally stratified program has 
a unique perfect model and that this model is independent of the choice of 
the level mapping with respect to which the program is locally stratified. In 
fact, we are more interested here in a generalization of the perfect model 
semantics to three-valued logic, and of course the objective underlying this 
generalization is the usual one, namely, to provide a single intended model for 
each given program. 

We will proceed next with presenting the rather involved definition of the 
weakly perfect model due to Przymusinska and Przymusinski.'? For ease of 
notation, it will be convenient to consider (countably infinite) propositional 
programs instead of programs over a first-order language, and we recall that 
we have already observed in Section 2.1 that this results in no loss of generality 
for our purposes. 

Let P be a (countably infinite propositional) normal logic program. We 
say that an atom A € Bp refers to an atom B € Bp if either B or ~B occurs 
as a body literal in a clause A — body in P with head A. We say that A refers 
negatively to B if ~B occurs as a body literal in such a clause. We say that A 
depends on B, written B < A, if the pair (A, B) is in the transitive closure of 
the relation refers to. We say that A depends negatively on B, written B < A, 
if there are C, D € Bp such that C refers negatively to D and the following 
conditions hold: (1) C < A or C = A (the latter meaning identity), and (2) 
B < D or B = D. For A,B € Bp, we write A ~ B if either A = B or A and B 
depend negatively on each other, so that A < B and B < A both hold in this 
latter case.!3 The relation ~ is an equivalence relation, and its equivalence 
classes are called components of P. A component is trivial if it consists of a 
single element A with A £ A. 

Notice that the definitions above can be viewed in a rather intuitive way 
by means of the dependency graph Gp of a program P, defined as follows. The 
vertices of Gp are the ground atoms appearing in P; for each clause A — body 
in ground(P) there is a positive directed edge in Gp from B to A if B occurs 
in body, and there is a negative directed edge from B to A in Gp if =B occurs 
in body. Then, in these terms, we have B < A if and only if there is a directed 
path in Gp from B to A, and we have B < A if and only if there is a directed 
path in Gp from B to A passing through a negative edge. 


12The notions of weak stratification and the weakly perfect model were introduced in the 
paper [Przymusinska and Przymusinski, 1990]. 

131t is noted in [Przymusinska and Przymusinski, 1990] that such mutual recursion is the 
primary cause of difficulties in defining declarative semantics for logic programs. 
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Let Cı and Co be two components of a program P. We write C1 < Co if 
and only if C1 Æ C2 and for each A; € C4 there is Ag € C2 with A; < Ag. A 
component Cı is called minimal if there is no component C2 with C2 < Ci. 

Given a normal logic program P, the bottom stratum S(P) of P is the union 
of all minimal components of P. The bottom layer of P is the subprogram L(P) 
of P which consists of all clauses from P with heads belonging to S(P). 

Given a three-valued interpretation I for P, thought of as a signed subset, 
we define the reduct of P with respect to I to be the program P/I obtained 
from P by performing the following reductions. (1) Remove from P all clauses 
which contain a body literal L such that ~L € I or whose head belongs to 
I. (2) Remove from all remaining clauses all body literals L with L € I. (3) 
Remove from the resulting program all non-unit clauses whose heads appear 
also as heads of unit clauses in the program. 

Note that the definition of P/I used here differs from that given in Defi- 
nition 2.3.6 in the context of stable models. The new definition just given will 
only be used in the present section. 


2.5.4 Definition The weakly perfect model Mp for a program P is defined by 
transfinite induction as follows. Let Pp = P, and let Mp = Ø. For each (count- 
able) ordinal a > 0 such that programs Ps and three-valued interpretations 
Ms have already been defined for all 6 < a, let 


Na = (J Ms, 
d<a 
Px = P/Na, 


Ra is the set of all atoms which are undefined in Na and were eliminated from 
P by reducing it with respect to Na, 


Sa = S (Pa), and 
La = L (Pa). 


The construction then proceeds with one of the following three cases. (1) If 
P, is empty, then the construction stops, and Mp = Na U ~Ra is the (total) 
weakly perfect model for P. (2) If the bottom stratum Sa is empty or if the 
bottom layer La contains a negative literal, then the construction also stops, 
and Mp = Na U =Ra is the (partial) weakly perfect model for P. (3) In the 
remaining case, La is a definite program, and we define Ma = HU~—Ra, where 
H is the total three-valued model corresponding to the least two-valued model 
for La, and the construction continues. 

For every a, the set Sa U Ra is called the a-th stratum of P, and the 
program La is called the a-th layer of P. 


We now present a detailed example of the calculation of the weakly perfect 
model; see also Program 2.6.12 for further discussion of this example. 
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2.5.5 Example Consider the program Tweety4, as follows; it is a modifica- 
tion of Tweety2 (Program 2.3.9), where the last clause has been changed. 


penguin(tweety) — 
bird(bob) — 
bird(X) — penguin(X) 
flies(X) — bird(X), —~penguin(X) 
penguin(bob) — penguin(bob), ~flies(bob) 


This program has the weakly perfect model 
{bird(bob), bird(tweety), penguin(tweety),=flies(tweety)}, 


and we show here how this model is calculated. We begin by setting P = Py = 
ground(Tweety4), as follows. 


penguin(tweety 
bird(bob 
bird(tweety) — penguin(tweety) 


flies(tweety) — bird(tweety), -penguin(tweety) 


ye 
) = 
) 
bird(bob)  penguin(bob) 
) 
flies(bob) — bird(bob), =penguin(bob) 
) 


penguin(bob) — penguin(bob), ~flies(bob) 


Next, we set Mo = Ú and carry out reduction of P) with respect to Mo 
to obtain P, = Po/Mo, which turns out to be equal to Po with the fourth 
clause removed. The dependency graph Gp, of P; is shown in Figure 2.1, 
where we use the obvious abbreviations for the ground atoms in Pı such 
as p(t) for penguin(tweety) and so on. Using Gp,, it is simple to check that 
the components of P, are {bird(bob)}, {bird(tweety)}, {penguin(tweety)}, 
{flies(tweety)}, and {flies(bob), penguin(bob)} and that the minimal 
components are the first three of these. Therefore, the bottom stratum 
Sı = S(P,) of Pı is {penguin(tweety), bird(bob), bird(tweety)}. Hence, 
the bottom layer Lı = L(P,) of P, is the definite program 


penguin(tweety) — 
bird(bob) — 
bird(tweety) — penguin(tweety) 


whose least two-valued model is clearly equal to Sı. Note that Mı = 
Us <1 Ms = Mo = Ø. Reduction of P) with respect to Mo removed one clause, 
but did not eliminate any atoms from P; hence, Rı = Ø. Since Lı is definite, 
we put Mı = HU-Rj, where H is the total three-valued model corresponding 
to Sı; thus, Mı = S1, and the process continues. 
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p(t) p(b) )+ 
|- 
z b(t) = b(b) = 
| | 
f(t) f(b) 


FIGURE 2.1: Dependency graph for P4. 


p(b) )+ 


f(b) 
FIGURE 2.2: Dependency graph for P2. 


The program P = P/M; is 


flies(bob) — ~penguin(bob) 
penguin(bob) — penguin(bob),~flies(bob) 


The dependency graph Gp, of P2, shown in Figure 2.2, has only one com- 
ponent {penguin(bob),flies(bob)}, which is therefore equal to the bot- 
tom stratum S2 = S(P2) of Py. Furthermore, Nə = Mo U Mı = Mı and 
Ry = {flies(tweety)}. Since the bottom layer Lə = L(Pz) is equal to Po, 
it is not definite. Therefore, the construction stops, and the weakly perfect 
model is No U ~Rə = Mı U ~Rə, as claimed. 


2.5.6 Proposition Let P be a program, and let M be its (partial) weakly 
perfect model. Then M is a model with respect to Kleene’s strong three-valued 
logic. 


Proof: It is straightforward to show that ®p(M) = M, and we leave the 
details to the reader. a 


A weakly stratified program is a program with a total weakly perfect model. 
The set of all its strata is then called its weak stratification. 


2.5.7 Remark We remark that our definition of weakly perfect model, as 
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given in Definition 2.5.4, differs slightly from the version introduced in 
[Przymusinska and Przymusinski, 1990]. In order to obtain the original def- 
inition, points (2) and (3) of Definition 2.5.4 have to be replaced with the fol- 
lowing: (2)' If the bottom stratum Sa is empty or if the bottom layer La has 
no least two-valued model, then the construction stops, and Mp = Na U7Rq 
is the (partial) weakly perfect model for P. (3) In the remaining case, La 
has a least two-valued model, and we define Ma = H U-7R,, where H is 
the three-valued model for La corresponding to its least two-valued model, 
and the construction continues. The original definition is more general due to 
the fact that every definite program has a least two-valued model. However, 
while the least two-valued model for a definite program can be obtained as 
the least fixed point of the monotonic (and even continuous) operator Tp, 
we know of no similar result, nor of a general operator, for obtaining the 
least two-valued model, if it exists, for programs which are not definite. The 
original definition therefore seems to be rather awkward, and indeed, even in 
[Przymusinska and Przymusinski, 1990], when defining weakly stratified pro- 
grams, the more general version was dropped in favour of requiring definite 
layers. So Definition 2.5.4 is an adaptation taking the original notion of weakly 
stratified program into account and appears to be more natural. Our use, 
therefore, of the term weakly perfect model will refer to Definition 2.5.4 unless 
stated to the contrary. 


Again, an alternative characterization of the weakly perfect model can be 
provided using level mappings. 


2.5.8 Definition Let P be a normal logic program, let I be a three-valued 
model for P, and let l be an J-partial level mapping for P. We say that P 
satisfies (WS) with respect to I and l if each A € dom(I) satisfies one of the 
following conditions. 


(WSi) A € J, and there is a clause A — Ly,..., Ln in ground(P) such that 
L; € I and L(A) > I(L;) for all i. 


(WSii) =A € J, and for each clause A — Aj,...,An,7Bj,...,7Bm in 
ground(P) one (at least) of the following conditions holds. 
(WSiia) There exists i with ~A; € I and I(A) > I(A)). 


(WSiib) For all k we have [(A) > 1(Ax), for all j we have [(A) > 
1(B;), and there exists i with ~A; € T. 


(WSiic) There exists j with B; € I and 1(A) > 1(B;). 


Noting that the condition (Fii) in Definition 2.4.8 implies that either 
(WSiia) or (WSiic) holds, we see that the condition (WSii) above is more 
general than (Fii); conditions (WSi) and (Fi) are identical. 


2.5.9 Theorem Let P be a normal logic program with weakly perfect model 
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Mp. Then, in the knowledge ordering Eg, Mp is the greatest model among all 
models J for which there exists an I-partial level mapping l for P such that 
P satisfies (WS) with respect to J and l. 


We prepare for the proof of Theorem 2.5.9 by introducing some notation 
which will help make the presentation transparent. 

It will be convenient to consider level mappings which map into pairs (3,7) 
of ordinals, where n < w. So let œa be a (countable) ordinal, and consider the 
set A of all pairs (8,n), where 8 < a and n < w. Of course, A endowed with 
the lexicographic ordering is isomorphic to an ordinal. So any mapping from 
Bp to A can be considered to be a level mapping. 

Let P be a program with (partial) weakly perfect model Mp. We define 
the Mp-partial level mapping lp as follows: lp(A) = (G,n), where A E€ SgURg 
and n is least with A € Ty, Ì (n+ 1), if such an n exists, and n = w otherwise. 
We observe that if lp(A) = Ip(B), then there exists a with A,B € Sa U Ra, 
and if A € Sa U Ra and B € Sg U Rg with a < p, then I(A) < I(B). 

The following notion will help to ease later notation. 


2.5.10 Definition Let P and Q be two programs, and let J be an interpre- 
tation. 


(1) Suppose that Cı = (A — Iy,..., Ln) and Co = (B — Ky,..., Km) are 
two clauses. Then we say that Cı subsumes C2, written C1 < C2, if A = B 
and {Iy,. .., Ln} C {Kiseg Km}. 


(2) We say that P subsumes Q, written P x Q, if for each clause C4 in P 
there exists a clause C2 in Q with Cy = C2. 


(3) We say that P subsumes Q model-consistently (with respect to I), written 
P xr Q, if the following conditions hold. 


(i) For each clause Cy = (A — L1,..., Ln) in P, there exists a clause 
C2 = (B = Kı,..., Km) in Q with Ci x C2 and {Ki,..., Km} \ 


(ii) For each clause C2 = (B — Kjy,...,Km) in Q which satisfies 
{ki,...,Km} C I and B ¢ I, there exists a clause C1 in P such 
that Ci x Co. 


Definition 2.5.10 will facilitate the proof of Theorem 2.5.9 by employing 
the following lemma. 


2.5.11 Lemma With the notation established in Definition 2.5.4, we have 
P/N 3N, P for all a. 


Proof: Condition 3(i) of Definition 2.5.10 holds because every clause C1 = 
(A Li,..., Ln) in P/N, is obtained from a clause Cy = (A — Ky,..., Km) 
in P by deleting body literals which are contained in Na. Clearly, C1 < C2, 
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and the set difference {K;,... , Km} \ {L1, - - - , Ln} contains only elements of 
Na. Condition 3(ii) holds because for each clause Cz = (A — Ky,..., Km) in 
P with head A ¢ Na whose body is true under Na, Step 2 in the reduction 
of P with respect to Na removes all the body literals K;. Therefore, we have 
that Cı = (A <) is a fact in P/Na, and clearly, C1 < C2. | 


The next lemma establishes the induction step in Part (2) of the proof of 
Theorem 2.5.9. 


2.5.12 Lemma If J is a non-empty three-valued model for a (infinite propo- 
sitional normal) logic program P’ and l is an J-partial level mapping such 
that P’ satisfies (WS) with respect to J and l, then the following hold for 
P=P'/. 


(a) The bottom stratum S(P) of P is non-empty and consists of trivial com- 
ponents only. 


(b) The bottom layer L(P) of P is definite. 


(c) The three-valued model N corresponding to the least two-valued model 
for L(P) is consistent with J in the following sense: we have I’ C N, where 
I’ is the restriction of I to all atoms which are not undefined in N. 


(d) P/N satisfies (WS) with respect to [\ N and l|y, where l|y is the restric- 
tion of | to the atoms in I \ N. 


Proof: (a) Assume that there exists some component C C S(P) which is 
not trivial. Then there must exist atoms A,B € C with A < B, B < A, 
and A # B. Without loss of generality, we can assume that A is chosen such 
that 1(A) is minimal. Now let A’ be any atom occurring in the body of a 
clause with head A. If A’ occurs positively, then A > B > A > A’, and so 
A > A’; if A’ occurs negatively, then A > A’ also. Therefore, by minimality 
of the component, we must also have A’ > A. Thus, we obtain that all atoms 
occurring positively or negatively in the bodies of clauses with head A must 
be contained in C. We consider two cases. 

Case i. If A € I, then there must be a fact A — in P; otherwise, by (WSi) 
we have a clause A — [y,..., Ln (for some n > 1) with Ly,...,L2, € I and 
1(A) > U(L,) for all i, contradicting the minimality of /(A). Since P = P’/@, 
we obtain that A < is the only clause in P with head A, contradicting the 
existence of B # A with B < A. 

Case ii. If ~A € I, then since A was chosen to be minimal with re- 
spect to l, we obtain that condition (WSiib) must hold for each clause 
A <— Aj,...,An,7B1,...,72Bm with respect to I and l and that m = 0. 
Furthermore, all A; must be contained in C, as already noted above, and 
L(A) > 1(A;) for all i by (WSiib). Also, from Case i, we obtain that no A; can 
be contained in J. We have now established that, for all A; in the body of any 
clause with head A, we have L(A) = 1(A;) and 4A; € J. The same argument 
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holds for all clauses with head A;, for all i, and the argument repeats itself. 
Now, from A > B, we obtain D, E € C with A> E (or A= E), D> B (or 
D = B), and E refers negatively to D. As we have just seen, we obtain ~E € I 
and I(E) = L(A). Since E refers negatively to D, there is a clause containing E 
in its head and =D in its body. Since (WSii) holds for this clause, there must 
be a literal L in its body with level less than I(E), so that I(L) < 1(A) and 
L € C, which is a contradiction. We thus have established that all components 
are trivial. 

We show next that the bottom stratum is non-empty. Indeed, let A be 
an atom such that /(A) is minimal. We will show that {A} is a component. 
Assume that this is not the case, that is, assume that there is B with B < A. 
Then there exist D1,..., Dp, for some k € N, such that Dı = A, D}; refers 
to Dj+1 for all j = 1,...,4 — 1, and D, refers negatively to some B’ with 
B' > B (or B' = B). 

We show by induction that, for all j = 1,...,k, the following statements 
hold: =D; € I, B < D}, and I(D;) = L(A). Indeed, note that for j = 1, that is, 
when D; = A, we have that B < D; = A and [(D,;) =1(A). Assuming A € J, 
we obtain, by minimality of L(A), that A + is the only clause in P = P’/@ with 
head A, contradicting the existence of B < A. So, ~A € I, and the assertion 
holds for 7 = 1. Now assume that the assertion holds for some j < k. Then 
obviously Dj+; > B since A > Də >... > Dg-1 > Dy > B’ > B. Since 
aD; € I and I(D;) = I(A), we obtain that (WSii) must hold, and, by the 
minimality of L(A), we infer that (WSiib) must hold and that no clause with 
head D; contains negated atoms. So, I(D;+1) = 1(D;) = L(A) holds by (WSiib) 
and the minimality of /(A). Furthermore, the assumption Dj+1 € I can be 
rejected by the same argument as for A above; otherwise, D;+1 — would be 
the only clause with head D,;+1 by minimality of [(D,+1) = L(A), contradicting 
B < Dj41. This concludes the inductive proof. 

Summarizing, we obtain that Dx refers negatively to B’ and that =D, € I. 
But then there is a clause satisfying (WSii) with head D;, and 4B’ in its body, 
and this contradicts the minimality of (Dx) = 1(A). This concludes the proof 
of statement (a). 

(b) Assume that L(P) is not definite. Then there exists a clause A — body 
in L(P) with a negated literal ~B occurring in body. But then B < A, and 
since the bottom stratum consists of minimal components only, we also have 
A < B, that is, A and B are in the same component, contradicting (a). 

(c) First, note that in forming the reduct P of P’ with respect to Ø, the 
third step is the only one in the process which has any effect in that it removes 
all non-unit clauses whose heads appear also as heads of unit clauses. Now 
let A € J’ be an atom with A ¢ N, and assume without loss of generality 
that A is chosen such that /(A) is minimal with these properties. By the first 
observation and the hypothesis that P’ satisfies (WS) with respect to J and 
l, there must be a clause A — L1,..., Ln in P such that, for all i, L; is true 
with respect to J, and hence true with respect to I’, and L(A) > 1(L;). Hence, 
all the literals L; are true with respect to N by minimality of I(A). Thus, 
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Lı,..., Ln is true in N, and, since N is a model for L(P), we obtain A € N, 
which contradicts our assumption. 

Now let A € N be an atom with A ¢ J’, and assume without loss of 
generality that A is chosen such that n is minimal with A € Trp) Î (n +1). 
Then there is a definite clause A — body in L(P) such that all atoms in 
body are true with respect to Typ) În. Hence, these atoms are also true with 
respect to I’, and, since I’ is a model for L(P), we obtain A € I’, which 
contradicts our assumption. 

Finally, let ~A € J’. Then we cannot have A € N; otherwise, A € I’. So, 
~A € N since N is a total model for L(P). 

(d) From Lemma 2.5.11, we know that P/N xn P. We distinguish two 
cases. 

Case i. If A € I\ N, then there must be a clause A — Iy,..., Lp in P such 
that L; € I and 1(A) > I(£,) for all i. Since it is not possible for A to belong 
to N, there must also be a clause in P/N which subsumes A + Li,..., Lk 
and which therefore satisfies (WSi). So, A satisfies (WSi). 

Case ii. If =A € I \ N, then, for each clause A — body1 in P/N, there 
must be a clause A — body in P which is subsumed by A ~ body1, and, since 
~A € I, we obtain that condition (WSii) must be satisfied by A and also by 
the clause A +— body. Since reduction with respect to N removes only body 
literals which are true in N, condition (WSii) is still fulfilled. a 


We can now proceed with the proof of Theorem 2.5.9. 


Proof of Theorem 2.5.9: The proof will proceed by establishing the fol- 
lowing facts: (1) P satisfies (WS) with respect to Mp and lp. (2) If I is a 
model for P and l is an J-partial level mapping such that P satisfies (WS) 
with respect to J and l, then I C Mp. 

(1) Let A € dom(lp), and suppose that Ip(A) = (a,n). We consider two 
cases. 

Case i. A € Mp. Then A € Tz, |(n +1). Hence, there is a definite clause 
A Aj,...,A, in La with Ay,...,A, E€ Tr, Tn. Thus, Ai,..., Ak E Mp and 
lp(A) > lp(A;) for all i. By Lemma 2.5.11, P/Na 3n, P. So there must be a 
clause A+ Aj,...,Ax,14,..., Lm in P with literals L1,..., Lm E€ Na C Mp, 
and we obtain lp(L;j) < lp(A) for all j = 1,...,m. So, (WSi) holds in this 
case. 

Case ii. >A € Mp. Let A — Aj,...,Ax,7B1,...,7B,, be a clause in 
P, noting that (WSii) is trivially satisfied in case no such clause exists. We 
consider the following two subcases. 

Subcase ii.a. Assume A is undefined in Na and was eliminated from P by 
reducing it with respect to Na, that is, A € Ra. Then, in particular, there 
must be some =A; € Na, or some B; € Na, which yields [p(A;) < lp(A), or 
lp(B;) < lp(A), respectively, and hence one of (WSiia), (WSiic) holds. 

Subcase ii.b. Assume =A € H, where H is the three-valued model cor- 
responding to the least two-valued model for La. Since P/N, subsumes P 
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model consistently with respect to Na, we obtain that there must be some A; 
with ~A; € H, and, by definition of lp, we obtain Ip(A) = [p(A;) = (a,w) 
and, hence, also [p(A;) < lp(A;) for all i’ 4 i. Furthermore, since P/Na is 
definite, we obtain that ~B; € Na for all j, and hence lp(B;) < Ip(A) for all 
j. So, condition (WSiib) is satisfied. 

(2) Suppose that J is a non-empty three-valued model for P and that l 
is an I-partial level mapping such that P satisfies (WS) with respect to I 
and l. First, note that for all models M, N of P with M C N, we have 
(P/M)/N = P/(MUN) = P/N and (P/N)/@= P/N. 

Let Ia denote I restricted to the atoms which are not undefined in Na U Ra. 
It suffices to show the following: for all a > 0, we have Ia C Na U Ra, and 
I\ Mp =0. 

We show next by induction that if œ > 0 is an ordinal, then the following 
statements hold. (a) The bottom stratum of P/N, is non-empty and consists 
of trivial components only. (b) The bottom layer of P/N, is definite. (c) 
Ig C Na U Ra. (d) P/Na+1 satisfies (WS) with respect to I \ Na+ı and 
lve 

Note that P satisfies the hypothesis of Lemma 2.5.12 and, hence, also its 
conclusions. So, on taking a = 1, we have that P/N; = P/Q satisfies (WS) 
with respect to I\ Mı and I|,,, and by application of Lemma 2.5.12, we obtain 
that statements (a) and (b) hold. For (c), note that no atom in R, can be true 
in I, because no atom in R; can appear as head of a clause in P, and now apply 
Lemma 2.5.12 (c). For (d), apply Lemma 2.5.12, noting that P/N2 <n, P. 

For q a limit ordinal, we can show, exactly as in the proof of Lemma 2.5.12 
(d), that P satisfies (WS) with respect to IJ \ Na and l|y,. So, Lemma 2.5.12 
is applicable, and statements (a) and (b) follow. For (c), let A € Ra. Then 
every clause in P with head A contains a body literal which is false in Na. By 
the induction hypothesis, this implies that no clause with head A in P can 
have a body which is true in J. So, A ¢ I. Together with Lemma 2.5.12 (c), 
this proves statement (c). For (d), apply again Lemma 2.5.12 (d), noting that 
P/Nowt <Naqs P: 

For a = 3+ 1 a successor ordinal, we obtain by the induction hypothesis 
that P/Ng satisfies the hypothesis of Lemma 2.5.12. So, again statements (a) 
and (b) follow immediately from this lemma, and (c) and (d) follow as in the 
case when a is a limit ordinal. 

It remains to show that I \ Mp = 9. Indeed, by the transfinite induction 
argument just given, we obtain that P/Mp satisfies (WS) with respect to 
I\ Mp and l| mp. If I \ Mp is non-empty, then by Lemma 2.5.12 the bottom 
stratum $(P/Mp) is non-empty, and the bottom layer L(P/Mp) is definite 
and has model M corresponding to the least two-valued model for L(P/Mp). 
Hence, by definition of the weakly perfect model Mp for P, we must have that 
M C Mp, which contradicts the fact that M is the least model for L(P/Mp). 
Hence, I \ Mp must be empty, and this concludes the proof. | 


The following corollary follows immediately as a special case. 
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2.5.13 Corollary A normal logic program P is weakly stratified, that is, has 
a total weakly perfect model if and only if there is a total model J for P and 
a (total) level mapping l for P such that P satisfies (WS) with respect to I 
and l. 


The weakly perfect model is in general different from the Fitting model. 


2.5.14 Proposition Let P be a program, let Mı be its Fitting model, and 
let Mə be its (partial) weakly perfect model. Then Mı C Mə. 


Proof: Let lı be an M;-partial level mapping such that P satisfies (F) with 
respect to Mı and lı. Then, trivially, P satisfies (WS) with respect to Mı and 
lı. Since Mə is the largest model among all models J for which there exists 
an I-partial level mapping l for P such that P satisfies (WS) with respect to 
I and l, by Theorem 2.5.9, we have that Mı C Mg. a 


The Fitting model does not in general coincide with the (partial) weakly 
perfect model, nor does it coincide in general with the perfect model for locally 
stratified programs. 


2.5.15 Program Let P be the program consisting of the single clause p — p. 
Then the Fitting model for P is Ø, but the (partial) weakly perfect model for 
P is {ap}. Note that P is locally stratified with perfect (two-valued) model 
in which p is false. 


We will see later in Section 6.3 that if P is a locally stratified program, 
then P is weakly stratified, and its (total) weakly perfect model is also its 
perfect model. So, the weakly perfect model semantics unifies two separate 
approaches. On the one hand, it is a generalization of the Fitting semantics 
and allows one to assign a single intended model to each program; on the 
other hand, it generalizes the perfect model semantics for locally stratified 
programs. 


2.5.16 Theorem Definite programs are locally stratified and have a total 
weakly perfect model. 


Proof: The first statement is trivial. For the second statement, let P be a 
definite program with least model I. Assign levels (A) to all A € I according 
to Proposition 2.3.2, and set I(B) = 0 for all B ¢ I. Considering the charac- 
terization of the weakly perfect model from Theorem 2.5.9, we observe that 
all A € I satisfy (WSi), while all other atoms satisfy (WSiib), and this suffices 
to establish the result. a 
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2.6 Well-Founded Models 


If we compare Definitions 2.4.8 and 2.5.8 and keep in mind that the main 
idea underlying stratification is to restrict recursion through negation, one 
may be led to ask whether Definition 2.5.8 is the most natural way to achieve 
this in a three-valued setting. Indeed, one may be led to propose the following 
definition. 


2.6.1 Definition Let P be a normal logic program, let J be a model for 
P, and let l be an J-partial level mapping for P. We say that P satisfies 
(WF) with respect to I and l if each A € dom(l) satisfies one of the following 
conditions. 


(WFi) A € I, and there is a clause A — L1,..., Ln in ground(P) such that 
L; € I and I(A) > I(L;) for all i. 


(WFii) =A € I, and for each clause A — Aj,...,An,7Bi,...,7Bm in 
ground(P) one (at least) of the following conditions holds. 


(WFiia) There exists i with 4A; € I and I(A) > 1(Aj). 
(WFiib) There exists j with B; € I and L(A) > 1(B;). 


If A € dom() satisfies (WFi), then we say that A satisfies (WFi) with respect 
to I and l, and similarly if A € dom(l) satisfies (WFii). 


We note that conditions (Fi), (WSi), and (WFi) are identical, and, fur- 
thermore, if P satisfies (WS) with respect to I and l, then it satisfies (WF) 
with respect to I and l. However, replacing (WFi) by a “stratified version” 
such as the following is not satisfactory. 


(SFi) A € J, and there is a clause A — Aj,...,An,7B1,...,7Bm in 
ground(P) such that A;,7B; € I, (A) > I(A;), and 1(A) > I(B;) 
for all ¿ and j. 


Indeed, if we do replace condition (WFi) by condition (SFi), then it is not 
guaranteed that, for a given program, there is a greatest model satisfying the 
desired properties. Consider the program consisting of the two clauses p — p 
and q — 7p, the two (total) models {p, ~q} and {—p, q}, and the level mapping 
l with I(p) = 0 and l(q) = 1. These models are incomparable, yet in both cases 
the conditions obtained by replacing (WFi) by (SFi) in (WF) are satisfied. 

So, in the light of Theorem 2.4.9, Definition 2.6.1 should provide a natural 
stratified version of the Fitting semantics, and indeed it does, see Program 
2.6.12 for an instructive example. Furthermore, the resulting semantics coin- 
cides with another well-known semantics, called the well-founded semantics, 
which is a very satisfactory result. To establish this claim, we need to introduce 
well-founded models, and this we do next. 
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Given a normal logic program P and I € Ip4, we say that U C Bp is 
an unfounded set (of P) with respect to I if each atom A € U satisfies the 
following condition. For each clause A +— body in ground(P) at least one of 
the following holds. 


(US1) Some (positive or negative) literal in body is false in I. 


(US2) Some (non-negated) atom in body occurs in U. 


2.6.2 Proposition Let P be a program, and let I € Ip4. Then there exists 
a greatest unfounded set of P with respect to I. 


Proof: If (U;)iez is a family of sets, each of which is an unfounded set of P 
with respect to J, then it is easy to see that U;-7 Ui is also an unfounded set 
of P with respect to I. a 


Let P be a program, and recall the definition of the operator Tp from 
Section 2.4. It is straightforward to lift Tp to an operator on Ip4, namely, by 
defining Tp(I), for I € Ip.4, to be the set of all A € Bp for which there is a 
clause A — body in ground(P) with body true in J with respect to Kleene’s 
strong three-valued logic. For all I € Ip.4, define Up(I) to be the greatest 
unfounded set (of P) with respect to J. Finally, define!4 


Wp(I) = Tp(Z) U Up) 


for all I € Ip4. We call Wp the Wp-operator. 
We note that Wp does not restrict to a function on Ip 3, which necessitates 
using Ip. instead. 


2.6.3 Example Consider Program 2.3.1 and I = {p} € Ip3. Then Tp(J) = 
{p} and Up(I) = {p}, so Wp(L) = {p, ap} € Ip3. 


2.6.4 Proposition Let P be a program. Then Wp is monotonic on Ip4. 


Proof: Let I,K € Ip4 with I C K. Then we obtain Tp(I) C Tp (K) as in 
the proof of Proposition 2.4.4. So it suffices to show that every unfounded set 
of P with respect to I is also an unfounded set of P with respect to K, and 
this fact follows immediately from the definition. | 


Since Wp is monotonic, it has a least fixed point by the Knaster-Tarski 
theorem, Theorem 1.1.10. The least fixed point of Wp is called the well- 
founded model for P, giving the well-founded semantics of P. We will show 
shortly that the well-founded model is always in Ip3, but let us remark first 
that the operator Wp is not order continuous in general nor even w-continuous, 
as the following example shows. 


14The operator Wp and the well-founded semantics are due to Van Gelder, Ross, and 
Schlipf, see [Van Gelder et al., 1991]. However, in the original definition, the operator Wp 
was not introduced using FOUR. 
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2.6.5 Program Let P be the following program. 


Then Wp fn = {p(s*(0)) | k < n} U {-9(s*(0)) |0 < k < n}, and 


Wp tw = {p(s"(0)) | n € N} U {>9(s"(0)) | n E€ N, n > OF 
# {p(s"(0)) | n € N} U {>g(s"(0)) | n € N, n > 0} U {>r} 
= Wpt(w T 1). 


2.6.6 Theorem Let P be a program. Then Wp fa € Ips for all ordinals a. 
In particular, the well-founded model for P is in Ip3. 


Proof: We first need some notation. Let M denote the least fixed point of 
Wp, and for each atom A € M7 let I(A) be the least ordinal 8 such that 
AEcEWpÎT(8+1). 

Now assume that there is an ordinal y which is least under the condition 
that Wp |} y ¢ Ip3. Then y must be a successor ordinal, since Ip3 is a 
complete partial order; so let I = Wp Î (y — 1) € Ip3. Now consider the set 
U = Tp(I) A Up(I). Then for each A € U and each clause A <— body in 
ground(P) such that body is true in J, we have that some (non-negated) atom 
B in body occurs in Up(I). We obtain B € Up(I) NJ, and since I C Tp (I) 
we get B € U. Now let A € U be chosen such that it is minimal with respect 
to L(A) = G, and notice that necessarily G < y. Then there exists a clause 
A <— body in ground(P) with body true in Wp Ùf @ C I, and in particular 
B € I and I(B) < L(A) for all (non-negated) atoms B which occur in body. 
But now we have just shown that B € U, contradicting minimality of I( A). E 


2.6.7 Proposition Let P be a program, and let I € Ip3. Then ®p(I) C 
Wp(J). Furthermore, the three-valued fixed points of Wp are three-valued 
supported models for P with respect to Kleene’s strong three-valued logic. 


Proof: Let A € Fp(I). Then for each clause A — body in ground(P), we have 
that I(body) = f, and so there is a literal L € body with I(L) = f. But then 
A is in the greatest unfounded set of P with respect to I, and so A € Up(I). 
This shows that ®p(I) C Wp(J). 

Now let M = Wp(M) = Tp(M)U-Up(M). We show that M = ®p(M) = 
Tp(M) U-Fp(M). For this it suffices to show that Up(M) C Fp(M). Let 
A € Up(M), and let A — body be an arbitrary clause in ground(P) with 
head A. Noting that Up(M) is an unfounded set of P with respect to M, if 
condition (US1) in the definition of an unfounded set holds, then body is false 
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in M in Kleene’s strong three-valued logic. If (US2) holds, then some atom 
in body occurs in Up(M) and, therefore, is false in M. Consequently, body is 
again false in M in Kleene’s strong three-valued logic. Hence, A € Fp(M), as 
required. | 


We will now show formally that the well-founded model can be character- 
ized using Definition 2.6.1.1 


2.6.8 Theorem Let P be a normal logic program with well-founded model 
M. Then, in the knowledge ordering, M is the greatest model among all 
models J for which there exists an I-partial level mapping l for P such that 
P satisfies (WF) with respect to I and l. 


Proof: Let Mp be the well-founded model for P, and define the Mp-partial 
level mapping lp as follows: lp(A) = a, where a is the least ordinal such that 
A is not undefined in Wp f (a + 1). The proof will proceed by establishing the 
following facts. (1) P satisfies (WF) with respect to Mp and lp. (2) If I is a 
model for P and l is an J-partial level mapping such that P satisfies (WF) 
with respect to J and l, then I C Mp. 

(1) Let A € dom(lp), and suppose that lp(A) = a. We consider the two 
cases corresponding to (WFi) and (WFii). 

Case i. A € Mp. Then A € Tp(Wp Î a). Hence, there exists a clause 
A + body in ground(P) such that body is true in Wp Î a. Thus, for all 
Li € body, we have that L; € Wp 7 a. Hence, lp(Li) < a = lp(A) and 
Li € Mp for all i. Consequently, A satisfies (WFi) with respect to Mp and 
Ip. 

Case ii. ~A € Mp. Then A € Up(Wp fa), and so A is contained in the 
greatest unfounded set of P with respect to Wp f? a. Hence, for each clause 
A <— body in ground(P), either (US1) or (US2) holds for this clause with 
respect to Wp Î a and the unfounded set Up(Wp { a). If (US1) holds, then 
there exists some literal L € body with ~L € Wp a. Hence, lp(L) < a and 
condition (WFiia) holds relative to Mp and Ip if L is an atom, or condition 
(WFiib) holds relative to Mp and lp if L is a negated atom. On the other 
hand, if (US2) holds, then some (non-negated) atom B in body occurs in 
Up(Wp a). Hence, p(B) < lp(A), and A satisfies (WFiia) with respect to 
Mp and lp. Thus, we have established that the statement (1) holds. 

(2) We show via transfinite induction on a = (A) that if A € I, or nA € J, 
then A € Wp} (a+1), or =A € Wp Î (a+1)), respectively. For the base case, 
note that if I(A) = 0, then A € J implies that A occurs as the head of a fact 
in ground(P). Hence, A € Wp 1. If =A € J, then consider the set U of all 
atoms B with I(B) = 0 and =B € I. We show that U is an unfounded set of 
P with respect to Wp 10, and this suffices since it implies ~A € Wp 1 by the 
fact that A € U. So let C € U, and let C — body be a clause in ground(P). 


15A different characterization using level mappings, which is nevertheless in the same 
spirit, can be found in [Lifschitz et al., 1995]. 
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Since =C € I, and I(C) = 0, we have that C satisfies (WFiia) with respect to 
I and l, and so condition (US2) is satisfied showing that U is an unfounded 
set of P with respect to J. Assume now that the induction hypothesis holds 
for all B € Bp with I(B) < a. We consider two cases. 

Case i. A € I. Then A satisfies (WFi) with respect to I and l. Hence, 
there is a clause A — body in ground(P) such that body C J and I(K) < a 
for all K € body. Hence, body C Wp Ta, and we obtain A € Tp(Wp Ì a), as 
required. 

Case ii. ~A € I. Consider the set U of all atoms B with I(B) = a and 
~B € I. We show that U is an unfounded set of P with respect to Wp Ta, 
and this suffices since it implies =A € Wp Ì (a+ 1) by the fact that A € U. 
So let C € U, and let C — body be a clause in ground(P). Since =C € J, 
we have that C satisfies (WFii) with respect to I and J. If there is a literal 
L € body with =L € I and I(L) < I(C), then by the induction hypothesis 
we obtain =L € Wp Î a, and therefore condition (US1) is satisfied for the 
clause C — body with respect to Wp J a and U. In the remaining case, we 
have that C satisfies condition (WFiia), and there exists an atom B € body 
with =B € I and I(B) =1(C). Hence, B € U showing that condition (US2) is 
satisfied for the clause C <— body with respect to Wp fa and U. Hence, U is 
an unfounded set of P with respect to Wp fa. | 


As a special case, we immediately obtain the following corollary. 


2.6.9 Corollary A normal logic program P has a total well-founded model 
if and only if there is a total model J for P and a (total) level mapping l such 
that P satisfies (WF) with respect to I and l. 


The well-founded model is in general different from the weakly perfect 
model, but always contains it. 


2.6.10 Proposition Let P be a program, let Mı be its (partial) weakly per- 
fect model, and let Mə be its well-founded model. Then Mı C Mo. 


Proof: Let lı be an M,-partial level mapping such that P satisfies (WS) with 
respect to Mı and lı. Then P satisfies (WF) with respect to Mı and lı, as 
noted earlier. By Theorem 2.6.8, Mə is largest among all models I for which 
there exists an I-partial level mapping l for P such that P satisfies (WF) with 
respect to I and l, and hence Mı C Mo. E 


2.6.11 Program Let P be the program consisting of the two clauses 


P q, =P 
ESP 


Then the reduct Pı of P with respect to the empty set is P itself, that is, 
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P, = P/Q = P. The only minimal component of P, is the set {p,q}, and hence 
the bottom layer of P, is P; it follows that the (partial) weakly perfect model 
for P is Ø. However, by applying Theorem 2.6.8, it is easy to see that {=p, ~q} 
is the well-founded model for P. Indeed, more directly, we have T(0) = 9, 
and Up(0) = {p,q}. Therefore, Wp t 2 = Wp(Wpt 1) = Wp(OU {7p, ~q}) = 
{=p, ~q} = Wp 1, and it follows that the well-founded model for P is indeed 
{>p, =q}. 


An irregular property of the weakly perfect model semantics is that certain 
changes in the program affect the semantics, although inutitively they should 
not. 


2.6.12 Program (Tweety4) Consider again the program Tweety4 of Ex- 
ample 2.5.5. As noted earlier, this program is a variation of Tweety2 (Pro- 
gram 2.3.9), with the last clause changed; it is intuitively clear that this change 
should not alter the semantics of the program. 

While the program Tweety2, which is locally stratified, has the expected 
weakly perfect model as discussed in Example 2.5.3, the program Tweety4 has 
weakly perfect model 


{penguin(tweety), bird(bob), bird(tweety), -flies(tweety)}, 


as shown in Example 2.5.5. So again we are unable to determine whether or 
not bob is a penguin. 

The well-founded semantics, however, does not suffer from the same defi- 
ciency. Indeed, it turns out to be M U-(Bp \ M), where M is as in Example 
2.2.7. So in this semantics bob is not a penguin and flies. 


An alternative way of characterizing the well-founded semantics is via the 
Gelfond-Lifschitz operator from Section 2.3. Recall from Theorem 2.3.7 that 
the Gelfond—Lifschitz operator is antitonic. In particular, this means that 
for any program P, the operator Gis; obtained by applying GLp twice, is 
monotonic. Therefore, by the Knaster-Tarski theorem, GL2 has a least fixed 
point, Lp. Note further that Ip 2 is a complete lattice in the dual of the truth 
ordering on Ip 2. So, on applying the Knaster-Tarski theorem again, we also 
obtain that Gls has a greatest fixed point, Gp. Since Lp C Gp, we obtain 
that Lp U-(Bp \ Gp) is a three-valued interpretation for P and is, in fact, a 
model for P, as we show next, called the alternating fixed point model for P. 

We are going to show that the alternating fixed point model coincides 
with the well-founded model. Let us first introduce some temporary notation, 
where P is an arbitrary program. 


Lo = 9 Go = Bp 
La+ı = GLp(Ga) Ga+1 = GLp(La) for any ordinal a 
La = (J Le Ga = () Ge for a limit ordinal a. 


B<a B<a 
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Since Ø C Bp, we obtain Lg C Lı C Gi C Go, and, by transfinite induc- 
tion, it can easily be shown that La C Lg C Gg C Ga whenever a < p. 


2.6.13 Theorem Let P be a program. Then the following hold. 
(a) Lp = GLp(Gp) and Gp = GLp(Lp). 

(b) For every stable model S for P, we have Lp C S C Gp. 

(c) M = Lp U-(Bp \ Gp) is the well-founded model for P. 


Proof: (a) We obtain GL3(GLp(Lp)) = GLp(GL3(Lp)) = GLp(Lp), so 
GLp(Lp) is a fixed point of GL%, and hence Lp C GLp(Lp) C Gp. Similarly, 
Lp C GLp(Gp) C Gp. Since Lp C Gp, we get from the antitonicity of GLp 
that Lp G GLp(Gp) C GLp(Lp) a Gp. Similarly, since GLp(Lp) a Gp, we 
obtain GLp(Gp) C GL} (Lp) = Lp C GLp(Gp), so GLp(Gp) = Lp, and 
hence Gp = GL} (Gp) = GLp(Lp). 

(b) It suffices to note that S is a fixed point of GLp, by Theorem 2.3.7, 
and, hence, is a fixed point of GL: 

(c) We prove this statement by applying Theorem 2.6.8. First, we define 
an M-partial level mapping l. For convenience, we will take as image set of l, 
pairs (a, n) of ordinals, where n < w, with the lexicographic ordering. This can 
be done without loss of generality because any set of pairs of ordinals, lexico- 
graphically ordered, is certainly well-ordered and therefore order-isomorphic 
to an ordinal, as noted earlier. For A € Lp, let I(A) be the pair (œ, n), where 
a is the least ordinal such that A € La+1, and n is the least ordinal such that 
A € Tpja, T (n+ 1). For B ¢ Gp, let I(B) be the pair (8,w), where £ is the 
least ordinal such that B  Gg+1. We show next by transfinite induction that 
P satisfies (WF) with respect to M and l. 

Let A € Lı = Tp/Bp | w. Since P/Bp consists of exactly all clauses from 
ground(P) which contain no negation, we have that A is contained in the least 
two-valued model for a definite subprogram of P, namely, P/Bp, and (WFi) 
is satisfied, by Proposition 2.3.2. Now let =B € -=(Bp \ Gp) be such that 
B € (Bp\ G1) = Bp\Tp/g Tw. Since P/Q contains all clauses from ground(P) 
with all negative literals removed, we obtain that each clause in ground(P) 
with head B must contain a positive body literal C ¢ G,, which, by definition 
of J, must have the same level as B; hence, (WFiia) is satisfied. 

Assume now that, for some ordinal a, we have shown that A satisfies (WF) 
with respect to M and | for all n < w and all A € Bp with I(A) < (a,n). 

Let A € La+ı \ La = Tp/Ga Tw \ La. Then AE Tp/Ga Tn \ La for some 
n € N; note that all (negative) literals which were removed by the Gelfond- 
Lifschitz transformation from clauses with head A have level less than (a, 0). 
Then the assertion that A satisfies (WF) with respect to M and l follows 
again by Proposition 2.3.2. 

Let A € (Bp \ Ga+1) N Ga. Then we have A ¢ Tpyy, Tw. Let A — 
Aj,...,Ax,7B1,...,7Bm be a clause in ground(P). If B; € La for some j, 
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then 1(A) > 1(B;). Otherwise, since A ¢ Tp,;,, Tw, we have that there exists 
A; with A; Z TpjL, Tw, and hence I(A) > I(A;), and this suffices. 

This finishes the proof that P satisfies (WF) with respect to M and l. It 
therefore only remains to show that M is greatest with this property. 

So assume that Mı 4 M is the greatest model such that P satisfies (WF) 
with respect to Mı and some Mj-partial level mapping l4. 

Assume L € Mı \ M, and, without loss of generality, let the literal L be 
chosen such that lı (L) is minimal. We consider the following two cases. 

Case i. If L = A is an atom, then there exists a clause A ~ body in 
ground(P) such that body is true in Mı and (L) < (A) for all literals 
L in body. Hence, body is true in M, and A < body transforms to a clause 
A + Aj,...,An in P/Gp with A1,..., An E Lp = Tp/g, Tw. But this implies 
A € M, contradicting A € Mı \ M. 

Case ii. If L = ~A € Mı \ M is a negated atom, then ~A € Mı and 
A € Gp = Tp/Lp tw, so A € TpjLp În for some n € N. We show by induction 
on n that this leads to a contradiction to finish the proof. 

If A € TpjLp Î 1, then there is a unit clause A — in P/Lp, and any 
corresponding clause A — —B,,...,7B, in ground(P) satisfies B1,..., Bp ¢ 
Lp. Since ~A € Mı, we also obtain by Theorem 2.6.8 that there is i € 
{1,..., k} such that B; € Mı and 1,(B;) < (A). By minimality of lı(A), we 
obtain B; E€ M, and hence B; € Lp, which contradicts B; ¢ Lp. 

Now assume that there is no =B € M, \ M with B € Tp;,, 1k for any 
k <n+1, and let ~A € Mı \ M with A € Tp,,, T(n +1). Then there is a 
clause A< Aj,...,Am in P/Lp with Aj,...,Am € TpjLp tn C Gp, and we 
note that we cannot have ~A; € Mı \M for any i € {1,...,m} by our current 
induction hypothesis. Furthermore, it is also impossible for ~A; to belong to 
M for any i; otherwise, we would have A; € Bp \ Gp. Thus, we conclude 
that we cannot have ~A; € M; for any i. Moreover, there is a corresponding 
clause A — Aj,...,Am,7B1,...,7Bm, in ground(P) with Bi,...,Bm, Z 
Lp. Hence, by Theorem 2.6.8, we know that there is 7 € {1,..., mı} such 
that B; € Mı and 1,(B;) < L(A). By minimality of 1,(A), we conclude that 
Bi € M, so that B; € Lp, and this contradicts B; ¢ Lp. | 


It follows from Theorem 2.6.13 (b) that total well-founded models are 
unique stable models. The converse, however, does not hold. Indeed, Program 
2.4.14 has well-founded model 9, as can easily be seen by noting that GLp(0) = 
Bp and GLp(Bp) = 0. 


2.6.14 Theorem Let P be a program with a total Fitting model. Then P 
has a total well-founded model and a total weakly perfect model. Moreover, 
P also has a unique stable and a unique supported model. Furthermore, all 
these models coincide. 


Proof: By Propositions 2.5.14 and 2.6.10, P has a total well-founded and a 
total weakly perfect model, both of which coincide with the Fitting model. 
By Theorem 2.6.13 (b), P has a unique stable model, and this coincides with 
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the well-founded model by Theorem 2.6.13 (c). Finally, by Proposition 2.4.13, 
P has a unique supported model, and this model coincides with its Fitting 
model. | 


Chapter 3 


Topology and Logic Programming 


In this chapter, we consider the role of topology in logic programming se- 
mantics. There is a considerable history of topology being used in computer 
science in general, much of it stemming from the role of the Scott topology in 
domain theory and in conventional programming language semantics. How- 
ever, topological methods have been employed in a number of other areas 
of importance in computing, including digital topology in image processing, 
software engineering, and the use of metric spaces in concurrency, for exam- 
ple. In addition, topological methods and ideas have been used in founda- 
tional investigations via the topology of observable properties of M.B. Smyth, 
see (Smyth, 1992]. Again, Blair et al. have made considerable use of con- 
vergence spaces in unifying discrete and continuous models of computation 
and, hence, in providing models for hybrid systems. Indeed, these authors, 
see [Blair et al., 1999] and [Blair and Remmel, 2001], for example, view any 
model of computation in which there is a notion of evolving state as a dy- 
namical system. Such models of computation include, of course, Turing ma- 
chines, finite state machines, logic programs, neural networks, etc. On the 
other hand, convergence spaces, as already noted earlier, provide a very gen- 
eral framework in which to study convergence and continuity, either by means 
of nets or by filters, and include topologies as a special case. It is shown in 
[Blair et al., 1999] and [Blair and Remmel, 2001] that the execution traces of 
a dynamical system can be realized as those solutions of a certain type of 
constraint on a convergence space that yield continuous instances of the con- 
straint. This work provides a foundation for hybrid systems. Furthermore, the 
papers [Blair et al., 1997a, Blair et al., 1997b, Blair, 2007, Blair et al., 2007] 
give many other interesting applications of ideas of a dynamical systems and 
analytical nature to the theory of computation, including logic programming 
in particular. 

Here, we want to explore the role of topology in finding models for logic 
programs and its role as a foundational framework for logic programming 
semantics.! Thus, our focus is the study of topologies and their properties 
on spaces I(X,T) of interpretations, and we work with general truth sets 
T wherever possible, only imposing conditions as appropriate and necessary. 
There are two main topologies which we discuss in this chapter and which have 


1The thesis [Ferry, 1994] and the paper [Heinze, 2003] contain results concerning the 
characterization in topological terms of the various standard models for logic programs 
discussed in Chapter 2. 
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important properties in relation to logic programming semantics, namely, the 
well-known Scott topology and a topology, called the Cantor topology by us,” 
which has connections with the Scott topology. Our goal is to establish the 
basic facts about these two topologies and to consider continuity of semantic 
operators in them. In fact, we deal with continuity in the Scott topology in this 
chapter, but postpone our discussion of continuity in the Cantor topology until 
Chapter 5. Later on, we will see how the results we establish can be employed 
in studying acceptable programs and termination issues, and we will also see 
that the topologies we discuss underlie the fixed-point structures we introduce 
in later chapters. 

In fact, in many ways it is the convergence properties of these topologies 
which are most important, as already noted in the Introduction, and therefore 
we take convergence as a fundamental notion and base our discussion upon 
it. Nevertheless, we quite easily obtain descriptions of the topologies we study 
in terms of more familiar notions such as basic open sets. Actually, conver- 
gence per se is formalized completely generally via the concept of convergence 
spaces, and therefore we take convergence spaces as our starting point. In fact, 
we focus mainly on the so-called convergence classes, which form a subclass 
of the convergence spaces, because convergence classes correspond to conven- 
tional topologies, whereas convergence spaces give more general theories of 
convergence than are needed here. 

As can be seen from the results of Chapter 2, the notion of order is not 
entirely satisfactory as a foundation for logic programming semantics due to 
the failure in general of the immediate consequence operator to be monotonic 
in the natural order present. However, order can be expressed through conver- 
gence, as we show here. Indeed, convergence spaces and convergence classes 
are to a considerable extent appropriate structures with which to investigate 
semantical questions in computer science in general and in logic programming 
in particular. 


3.1 Convergence Spaces and Convergence Classes 


The theory of convergence can be based either on nets or on filters,? and 
these two approaches are equivalent in that any result which can be estab- 
lished by the one can equally well be established by the other. We will work 
exclusively with nets since they give rather intuitive descriptions of the sort 
of conditions we want to consider in logic programming. The facts we need 


?The Cantor topology was introduced in [Batarekh and Subrahmanian, 1989a] and in 
[Batarekh and Subrahmanian, 1989b], see also [Batarekh, 1989], under a restriction called 
the matching condition and was treated in complete generality in [Seda, 1995]. 

3Our basic references to the theory of nets and filters are the books [Kelley, 1975] and 
[Willard, 1970]. 
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concerning nets, and our notation in this respect, can be found in the Ap- 
pendix.* Indeed, all the basic facts we need concerning general topology have 
been collected together in the Appendix. 

We begin with some basic definitions. 


3.1.1 Definition Let X be a non-empty set. We call the pair (X,S) = 
(X, (Ss) scx) a convergence space if, for each s € X, S, is a non-empty collec- 
tion of nets in X with the following properties. 


(1) If (s;) is a constant net, that is, s; = s € X for all i, then (s;) € Ss. 
(2) If (si)ier € Ss and (t;);e7 is a subnet of (s;), then (t;);e7 € Ss. 


If (si) € Ss, we say s; converges to s and sometimes write s; — s to indicate 
this. 


3.1.2 Definition Let X be a non-empty set, and suppose that C is a class of 
pairs ((s;), s), where (s;);ez is a net in X and s is an element of X. We call C 
a convergence class if it satisfies the conditions below, in which we will write 
that s; converges (C) to s or that lim; s; = s (C) if and only if ((s;),s) € C. 


(1) (Constant nets) If (s;) is a net such that s; = s for all i, then ((s;), s) € C. 


(2) (Convergence of subnets) If (s;) converges (C) to s, then so does every 
subnet of (s;). 


(3) (Non-convergence)° If (s;) does not converge (C) to s, then there is a 
subnet of (s;), of which no subnet converges (C) to s. 


(4) (Iterated limits) Suppose that I is a directed set and that Jm is a directed 
set for each m € I. Form the fibred product FY = I xg Unepdm = 
{(m,n) | m € I,n € Jm}, and suppose that x: F’ — X. Let F denote 
the product directed set® I x [J,,c¢7Jm, and let r : F — F’ be defined 
by r(m, f) = (m, f(m)). If lim, lim, z(m, n) = s (C), then the net ror 
converges (C) to s. 


The principal result concerning convergence classes, see [Kelley, 1975, 
Chapter 2] or [Seda et al., 2003], is that each convergence class C on X in- 
duces a closure operator on X which in turn induces a topology on X, in 
accordance with Theorem A.2.9, in which the convergent nets and their limits 
are precisely those given in C. More precisely, we have the following result 
which shows that the notion of convergence may be taken as fundamental. 


4We refer the reader again to [Kelley, 1975] for more details. 

5This formulation is as given in [Kelley, 1975]. An equivalent form, given in positive 
terms, is as follows: if every subnet of a net (s;) has a subnet converging to s, then (s;) 
converges to s. 

6By a product directed set I]mezim; we understand, of course, the pointwise ordering 
on the product Tlmerim of the directed sets Im; thus, for elements f and g of Tlmerim: 
we have f < g if and only if f(m) < g(m) for each m € I. 
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3.1.3 Theorem Let C be a convergence class in a non-empty set X. For each 
A C X, let A® = {s € X | there is a net (s;) in A with ((s;),s) € C}. Then 
-° is a closure operator on X and, hence, defines a topology T on X, called 
the topology associated with C. Moreover, we have ((s;),s) € C if and only if 
si — s with respect to T. 

Conversely, suppose that 7 is a topology on a non-empty set X. Let C 
denote the set of all pairs ((s;), s), where s € X and (s;)iez is a net in X 
which converges to s in the topology 7. Then C is a convergence class in X 
whose associated topology coincides with 7. 


Proof: The proof of the first part of the theorem is well-known and will be 
omitted, and we refer the reader to [Kelley, 1975] or [Seda et al., 2003] for 
details. 

For the converse, we note that properties (1), (2), and (3) in the definition 
of a convergence class are immediate for the class C by elementary properties 
of nets converging in a topology (see Definition A.3.3). Property (4) of the 
definition follows from the Theorem on Iterated Limits, see [Kelley, 1975, Page 
69], and, hence, the class C is a convergence class. Finally, let A C X be an 
arbitrary subset of X. By the definition of the closure operator determined by 
C as given in the first statement in the theorem, we have s € A® if and only 
if there is a net (s;) in A converging to s. But this is equivalent to s € A by 
statement (a) of Theorem A.3.5, and it follows that the associated topology 
of C coincides with 7. a 


Another basic definition is that of continuous function, as follows. 


3.1.4 Definition Let (X,S) and (Y, T) be convergence spaces. Then a func- 
tion f : X — Y is said to be continuous at s € X if (f(si)) € Tres) whenever 
(si) E Ss, that is, if f(s;) converges to f(s) whenever s; converges to s. 


There are a few points to be made about these definitions. First, suppose 
that C is a convergence class on X. For each s € X, let S, denote the collection 
of nets (s;) such that ((s;), s) € C. Then conditions (1) and (2) in the definition 
of C show that (X,(S,)scx) is, in fact, a convergence space. Second, since a 
function f : X — Y between topological spaces is continuous at s € X 
if and only if f(s;) converges to f(s) whenever the net s; converges to s, 
see (d) of Theorem A.3.5, we note that the notion of continuity just defined 
coincides with topological continuity when the convergence spaces in question 
are actually convergence classes. Finally, definitions equivalent to these can 
be given entirely in terms of filters, but we omit the details.” 

It is known that the full generality of convergence spaces is needed in mod- 
elling hybrid systems, as observed earlier. Here, in fact, all the convergence 
conditions we consider give rise to convergence classes and, hence, to topolo- 
gies, rather than to strict convergence spaces, and therefore our focus is on 
convergence classes as already noted. 


“We refer the reader to [Seda et al., 2003] for a treatment in terms of filters. 
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3.2 The Scott Topology on Spaces of Valuations 


The Scott topology is normally encountered in domain theory in the con- 
text of solving recursive domain equations and in understanding self reference. 
However, it also has a role in logic programming, which we discuss in this sec- 
tion, and indeed, in a certain sense, it naturally underpins definite programs. 

We begin with the following basic definition and refer the reader to the 
Appendix, both for proofs of the results we simply state here and also for a 
development of the elements of the Scott topology. 


3.2.1 Definition Let (D,E) be a complete partial order. A set O C D is 
called Scott open’ if it satisfies the following two conditions: (1) O is upwards 
closed in the sense that whenever x € O and x E y, we have y € O, and (2) 
whenever A C D is directed and |] A € O, then AN O #9. 


In the case of a domain D, this topology has a rather simple description 
in that the collection {fa | a € De} is a base for the Scott topology on D, 
where fx = {y € D | x E y} for any x € D, as we see in the next proposition. 


3.2.2 Proposition Let (D,E) be a domain. Then the following statements 
hold. 


(a) The Scott-open sets form a topology on D called the Scott topology. 
(b) For each compact element a € De, the set fa is a Scott-open set. 
(c) The collection {fa | a € De} is a base for the Scott topology on D. 


Proof: (a) That Ø and D are Scott open is easy to see. If O} and Og are 
Scott open, if x € O, N O2, and if x E y, then it is clear that y € O1 N Og. 
Suppose that A is directed and || A € O1 N O2. Then there are a1,a2 € A 
such that a; € Oı and ag € Og. Therefore, by directedness of A, there is 
a3 E€ A such that a; E ag and a2 E ag. But then a3 € Oı N O2, and hence 
az E€ AN (O1 N02), as required to see that O1 NO» is Scott open. Finally, it is 
easy to check that a union U,--O; of Scott-open sets O;,7 € T, is itself Scott 
open. 

(b) If x €faand z E y, then it is immediate that y € Ta. Now suppose 
that A is directed and | | A € fa. Then a is compact and a E | | A. Therefore, 
there is a’ € A such that a E a’. Hence, a’ € fa by definition of fa, that is, 
a’ E€ AN Ta showing that AN Ta Æ Q, as required. 

(c) First we show that this collection is a base for some topology on D. 
Let x € D be arbitrary. Then approx(x) is directed and is non-empty; let a € 
approx(x). Then, a € D, and a E g, so that x € fa, and hence |J ep Ta = D. 


icT 


8See [Abramsky and Jung, 1994, Gierz et al., 2003, Stoltenberg-Hansen et al., 1994]. 
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Now suppose that a; and ag are compact elements and that z € T aN 7 ao. 
Then a1,a2 E€ approx(z), and by directedness there is a3 € approx(z) such 
that a, E ag and ag E a3. Hence, we have a3 € fT aiMN ft ag. But T aiNT az is 
clearly upwards closed, and so we obtain z € Tag CT aiM fag and ag € De, as 
required. 

Finally, we show that the collection {fa | a € De} is a base for the Scott 
topology on D. Let O be any Scott-open set, and let x € O. Then approx(z) 
is directed, and we have that | |approx(x) = x € O. Therefore, there is some 
a € approx(x) such that a € O. But then a € De and a E x. Therefore, 
x € ta CO, where a is a compact element, as required. | 


We refer to the elements of the Scott topology as Scott-open sets. Likewise, 
we refer to neighbourhoods in the Scott topology as Scott neighbourhoods, 
and so on. 

We next give a simple example of the Scott topology in the context of Ip... 


3.2.3 Example Consider the definite program P as follows. 


pla) — 
p(s(X)) — p(X) 


This program is intended to compute the natural numbers, where a is the 
natural number 0, and s is the successor function on the natural numbers. 

In accordance with Theorem 1.3.4, the set Ip = Ip 2 of all two-valued 
interpretations for P is a domain, and, furthermore, its compact elements are 
the finite subsets I of Bp, where as usual we are identifying a two-valued 
interpretation with the set of ground atoms which are true in J. Therefore, a 
typical basic open set in the Scott topology on Ip is the set TI = {I' C Bp | 
I C T'} of all supersets of the finite set I. 


One of our main aims here is to present the Scott topology in terms of 
convergence, and we proceed to do this next.® 


3.2.4 Theorem Let (D,E) denote a domain, let (s;) be a net in D, and let 
s denote an element of D. Define lim; s; = s (C) to mean that 


for each a € approx(s), there is an index io such that a E s; whenever io < i. 
Then the condition just given determines a convergence class C whose associ- 


ated topology is the Scott topology on D. Therefore, a net s; converges to s 
in the Scott topology on D if and only if it satisfies the condition just stated. 


Proof: We first verify that the conditions (1), (2), (3), and (4) in the definition 
of a convergence class, see Definition 3.1.2, hold with the given meaning of 
lim; s; = s (C). 


9For further details of this result and of several more in this chapter, see [Seda, 2002]. 
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(1) Suppose that s; = s for all i € Z is a constant net, and let a € 
approx(s). Thus, a is a compact element satisfying a E s. Therefore, we have 
aC s; for all i. So, ((s;),s) € C. 

(2) Suppose that ((s;), s) € C and that (tj)jeg is a subnet of (si)iez. Thus, 
there is a function @ : J — T such that (i) tj = sj) for all j € J, and (ii) 
for each ip € T, there is jo E€ J such that io < (j) whenever jo < j. Let 
a € approx(s) be arbitrary. Then because ((s;), s) € C, there is an ig € Z such 
that a E s; whenever ig < i. Since tj is a subnet of si, there is jo € J such 
that io < (7) whenever jo < j. But then we have a E sj) whenever jo < j, 
that is, a E t; whenever jo < j. Therefore, ((t;), s) € C. 

(3) Suppose that ((s;),s) ¢ C. Then there exists a € approx(s) such that 
for each index ig there is an index jo > io with a Z s;,. Let J denote the 
collection of all these jo. Then clearly J is cofinal in Z, and hence (t;)jeg is 
a subnet of (s;), where t; = sj for each j € J. It is clear that if (r) is any 
subnet of (tj), then we have ((rz), 5) ¢ C. 

(4) Suppose that the conditions stated in (4) of Definition 3.1.2 all hold 
and that lim, lim, z(m, n) = s (C), where z : F’ — D. Let a € approx(s) 
be arbitrary. Because lim, lim, x(m,n) = s (C), there is an index mp € I 
such that a E lim, (m,n) whenever m > mo. But now we see that a € 
approx(lim,, z(m,n)). Therefore, for each fixed m > mo, there is an index 
Nm E Jm such that a E x(m,n) whenever n > nm. Define f € J [mer/m by 
setting f(m) = Nm € Jm whenever m > mo, and otherwise letting f(m) € Jm 
be arbitrary. Suppose that (m’, g) > (mo, f). Then m’ > mo and g > f, so that 
g(m’) > f(m) = nm’, that is, g(m’) > nw. Thus, a E x(m’, g(m’)) whenever 
(m',g) > (mo, f). Hence, a E xor(m',g) whenever (m’,g) > (mo, f), and it 
follows that (x o r,s) € C, as required. 

Next, we verify that the topology induced on D by the convergence con- 
dition coincides with the Scott topology on D. Let O be open in the topology 
associated with the convergence class C, let « € O, and suppose that x E y; 
suppose further that y ¢ O, that is, suppose that y is in the closed set D \ O. 
Then there is a net s; > y with s; € D \ O for all i. Let a € approx(z) be ar- 
bitrary. Then a € approx(y) and, hence, a E s; eventually. It follows from this 
that s; > x. Therefore, by (b) of Theorem A.3.5, we see that s; is eventually 
in O. This contradiction shows that y is, in fact, in O. Next, suppose that A 
is a directed set with x = |] A € O. Then by Proposition A.6.1 we have that, 
as a net, A — x. Therefore, A is eventually in O, and so ANO # 9. Hence, O 
is a Scott-open set. 

Conversely, suppose that O is a Scott-open set, and let x € O. We show 
that O is open in the topology associated with the convergence class C by 
establishing that, whenever s; — x, we have s; eventually in O, and then the 
result follows from (b) of Theorem A.3.5 again. Now, approx(x) is a directed 
set, and x = | |approx(x) € O. Therefore, there is an element a € approx(z) 
such that a € O. Since s; — x, it now follows that there is 79 such that for 
io < i we have a C s;. But then, since a € O and O is Scott open, we have 
si E€ O whenever ig < i, as required to finish the proof. | 
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Of course, a function f : D — E from a domain D to a domain Æ is called 
Scott continuous if it is continuous in the Scott topologies on D and E. How- 
ever, it is well-known that a function f between domains is Scott continuous 
if and only if it is continuous in the sense of Definition 1.1.7, see Proposi- 
tion A.6.4. Moreover, by virtue of Theorem 1.3.2 and Proposition A.6.5, we 
have the following result. 


3.2.5 Proposition Suppose that the truth set 7 is a domain. Then in the 
Scott topology I(X,T) is a compact To topological space, but is not Tı in 
general. 


Nets (and convergence classes), like sequences, are normally simple to han- 
dle, and their use makes checking continuity relatively straightforward, as we 
will see later on in several places. However, we move next to consider the sig- 
nificance of Theorem 3.2.4 in the case of spaces I(X,T) of valuations, where 
the set (J, <) of truth values is a domain. Indeed, suppose that (T, <) is a do- 
main and that the net (v;) converges to v in the Scott topology on the domain 
I(X,T). According to Theorem 3.2.4, this holds if and only if for each finite 
valuation u with u C v, there is an index io such that u E v; whenever io < i. 
In fact, when applied to the particular truth sets discussed in Section 1.3.2, 
Theorem 3.2.4 gives the following result. 


3.2.6 Theorem Suppose that (J;) is a net of interpretations and that J is an 
interpretation. 


(a) Let J denote the truth set TWO. Then, in the ordering E; on I(X, T), we 
have that (J;) converges to I in the Scott topology if and only if whenever 
x € I, eventually x € Jj. 


(b) Let T denote the truth set THREE. Then the following statements hold. 


(i) In the ordering Cy, on I(X,T), we have that (J;) converges to I in 
the Scott topology if and only if whenever x € I+, eventually x € Li., 
and whenever x € J¢, eventually x € Lip. 


(ii) In the ordering E; on I(X,T), we have that (J;) converges to I in 
the Scott topology if and only if whenever x € Iy, eventually x € Ji,, 
and whenever x € Jy, eventually x € I, U Ii- 


(c) Let T denote the truth set FOUR. Then the following statements hold. 


(i) In the ordering EC, on I(X,T), we have that (J;) converges to I 
in the Scott topology if and only if whenever x € i, eventually 
x € l, U Lin, whenever x € Ig, eventually x € Li, U Lip, and whenever 
x € Ip, eventually x € Lip 


b? 


(ii) In the ordering E, on I(X,T), we have that (I;) converges to I 
in the Scott topology if and only if whenever x € Jy, eventually 
x E lia UL, whenever x € Ip, eventually x € Ii, U Li, and whenever 
x € h, eventually x € Li. 


t? 
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Proof: We prove the first of the claims in (c), with the others being proved 
similarly. Let v denote the valuation corresponding to the interpretation J, 
and, for each index 2, let v; denote the valuation corresponding to the in- 
terpretation I;. Suppose first that (v;) converges to v in the Scott topology 
on I(X,T), and let x € X. Suppose further that x € vu, so that v(x) = t. 
Define u € I(X,T) by u(x) = t, and, for y 4 x, set u(y) = u. Then u is a 
finite element satisfying u Ep v. Therefore, by Theorem 3.2.4, there exists io 
such that u E vi whenever i > ig, and hence eventually either v;(x) = t or 
u,(x) = b. Thus, eventually x € vi, U Vip. A similar argument holds in case 
x E€ Ug Or x E€ Up, and hence we obtain the stated condition. 

Conversely, suppose that the given condition holds. Let u be a finite valua- 
tion such that u EC, v, and suppose further that u takes value u at all points of 
X except possibly at one point x, say. Let us first suppose that u(x) = t. Then 
either x E€ v or x € vp. But then, by the given condition, either eventually 
x E vi, Uva, or eventually x € vi, and in either case, eventually u Cy vi. A 
similar argument holds in case u(x) = f or u(x) = b. By a standard argument 
using the directedness of the index set of the net v;, it follows that, for any 
finite valuation u Cy v, we have eventually u Cy vi. Hence, (vi) converges to 
v in the Scott topology on I(X,7T), as required. a 


Thus, we obtain a uniform description of net convergence in the Scott 
topology on (I(X,T),€), where (T,<) is any one of the main sets of truth 
values which are important in logic programming. Indeed, the convergence 
conditions involved are simple, natural, and intuitive, and this is one of the 
advantages of approaching this topic via convergence. 

In fact, it is Part (a) of Theorem 3.2.6 which we will use most often, and 
we illustrate its use next with an example. 


3.2.7 Example The following statements concerning convergence in the 
Scott topology hold in two-valued logic.° 


(1) Any net (J) of interpretations converges to the empty interpretation 0. 


(2) If (J) is a net of interpretations which is monotonic in the sense that 
I) C I, whenever À < y, then (J)) converges to Uy J). 


(3) Ifa net (Jy) of interpretations converges to an interpretation J, and J C J, 
then (I)) converges to J. Thus, in general, a net (I) of interpretations 
has many limits. A specific example of this can be given as follows. Sup- 
pose that £ is a first-order language containing a unary predicate sym- 
bol p, a unary function symbol s, and a constant symbol a, such as the 
language underlying Example 3.2.3, say. Consider the sequence (In) of 
interpretations defined as follows: I, is the set {p(a), p(s(a))} if n is even 
and is the set {p(a), p(s(a)), p(s?(a))} if n is odd. Then (J,,) converges to 


10For further results in this direction, see [Seda, 1995]. 
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each of the interpretations 0, {p(a)}, {p(s(a))}, {p(a@), p(s(a))}, but not 
to {p(a), p(s(a)), p(s?(a))}. 

Again, if I, is the interpretation defined by taking it to be the set 
{p(a), p(s(a)),...,p(s"(a))} if n is even and taking it to be the set 
{p(a), p(s(a)),..., p(s?"(a))} if n is odd, then the sequence (In) converges 
to the interpretation {p(a), p(s(a)), p(s?(a)),...}; note that (In) is not 
monotonic in the sense of Part (2). 


Although we have taken convergence as the basic concept, it is easy to ex- 
hibit properties of the Scott topology in other familiar terms, as the following 
example shows. 


3.2.8 Example In the context of spaces of interpretations, Proposition 3.2.2 
gives a simple description of the basic open sets in the Scott topology, and 
we briefly consider this point here. In the case of TWO, for example, let 
Aji ron, Ay € X and let G(A1,..., An) = {I € I(X,TWO) | Ay,...,An € 
I}. By means of (f) of Theorem 1.3.2 and (c) of Proposition 3.2.2, it is 
clear that the sets G(A1,..., An) form a base for the Scott topology on 
I(X,TW0O). Indeed, the sets G(A) = {I € I(X,TWO) | A € I} forma 
subbase for the Scott topology, since G(A1,..-,An) = Miesr,....nyG(Ai)- As 
another example, consider this time the knowledge ordering CE, in the case of 
THREE. Take elements Aj,...,An,Bi,...,Bm E€ X, where n,m > 0, and 
let G(Aj,..., An; Bi,..., Bm) be the set {I € I(X, THREE) | Aj,..., An E€ 
I, and B1,..., Bm © Ig}. Then these sets form a base for the Scott topology 
on I(X,T HREE). Indeed, the sets G(A; B) clearly form a subbase for this 
topology, where G(A; B) = {I € I(X, THREE) | A € i, and B € Ip}. 
The other cases dealt with in Section 1.3.2 can be treated similarly. 


We turn next to consider the continuity of the immediate consequence 
operator in the Scott topology. By virtue of (a) of Theorem 2.2.3 and Propo- 
sition A.6.4, we have immediately that Tp is Scott continuous whenever P is a 
definite program. However, we will take the trouble to include a self-contained 
proof of this fact next. 


3.2.9 Theorem Let P be a definite program. Then Tp is continuous in the 
Scott topology on Ipo. 


Proof: Let I € Ip2, and let l; — I be a net converging to J in the Scott 
topology; we show that Tp(I;) — Tp(J) in the Scott topology. If Tp(Z) = 9, 
then the required conclusion is immediate since, by Theorem 3.2.4, every net 
in a domain converges in the Scott topology to the bottom element. So suppose 
that Tp(I) 4 0, and let A belong to Tp(I). Then there is a ground instance 
A + Aj,...,Ay, of a clause in P such that I(A, A... A An) = t, where 
n > 0. Since I; — I, we have, by (a) of Theorem 3.2.6, that eventually 
L(A ^... ^A An) =t. Therefore, A € Tp(I;) eventually. It now follows from 
Theorem 3.2.6 that Tp(I;) —> Tp(J) in the Scott topology, as required. a 
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It is not difficult to see that the converse of the previous result fails. For 
example, the program P; with clauses p(a) — p(a), p(a) — ap(a), and p(b) — 
pla) and the program P> with clauses p(a) — and p(b) — p(a) have the same 
(Scott continuous) immediate consequence operator. 

By contrast, recall that Program 2.4.12 showed that the Fitting operator 
is not order continuous, and hence not Scott continuous, for definite pro- 
grams. Nevertheless, Theorem 3.2.9 justifies our earlier statement that the 
Scott topology naturally underpins definite programs. 

A theme which is important in this chapter and in later ones concerns the 
convergence to some interpretation I of sequences TB(M) of iterates of Tp 
on an interpretation M, and under what conditions J is a model for P. We 
discuss this briefly now for definite programs and take it up in more detail in 
the next section for normal programs. 

In general, if (v;) is a net converging to v in the Scott topology on I(X,T), 
then it is clear from Theorem 3.2.4, see also Example 3.2.7, that (v;) converges 
to u whenever u E v and, hence, that the set of limits of (v;) is downwards 
closed.!! Indeed, since (v;) always converges to L, this latter set is always 
non-empty also. Furthermore, when 7 denotes the complete lattice TWO, we 
have by Theorem 1.3.4 that I(X,T) is itself a complete lattice. Thus, in this 
case, the supremum of the set of all limits, in the Scott topology, of a net (v;) 
exists and is easily seen to be a limit of (v;) also, by Theorem 3.2.6. We refer 
to this limit as the greatest limit of (vi) and denote it by gl(v;). In fact, it 
is readily checked that gl(v;) takes value t precisely on the set of all x € X 
at which eventually v; takes value t, and this property completely determines 
gl(v;), see [Seda, 1995] for more details. 

Of course, a sequence T2(M) always converges to the empty interpretation 
0, as already noted, but the interpretation Ø need not be a model for P. 
However, we do have the following result. 


3.2.10 Proposition Let P be a definite logic program, and let M be an inter- 
pretation for P. Then the greatest limit gl(T$(M/)) of the sequence (T'73(M)) 
is a model for P. 


Proof: Let I denote gl(T(M)). Then the sequence (Tp(M)) converges to 
I in the Scott topology. Hence, by the Scott continuity of Tp, the sequence 
(Tp(T2(M))) converges to Tp(I). Thus, (T2(M)) converges to Tp(I), and we 
obtain, by definition of the greatest limit, that Tp(I) C J, as required. a 


Finally, we note that if we take M to be the bottom element in I(X,T), 
then gl(T3(M)) coincides with the least fixed point of Tp and, hence, is the 
least model for the definite logic program P, see [Seda, 1995]. Thus, the usual 
two-valued semantics for definite programs can be expressed entirely in terms 
of convergence in the Scott topology. 


11A subset O of a partially ordered set (D,E) is called downwards closed if, whenever 
x E€ O and y E zx, we have y € O. 
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We close this section with an example which, despite its simplicity, illus- 
trates the main points discussed previously. 


3.2.11 Example Consider again the program P of Example 3.2.3. 


pla) — 
p(s(X)) — p(X) 


Let M = Ø, thought of as a two-valued interpretation, and let J,, denote 
the n-th iterate of Tp on M. Then I, = {p(a), p(s(a)),...,p(s”~*(a))} for 
any n > 1. By Part (2) of Example 3.2.7, the sequence (J,,) converges in the 
Scott topology to the set I = {p(a), p(s(a)),...,p(s”(a)),...} of all natural 
numbers. Moreover, I is clearly the greatest limit of the sequence (J,,) and, 
hence, by Theorem 3.2.10, is a model for P. Indeed, by the comments im- 
mediately prior to this example, J is the least model for P by the results of 
(Seda, 1995). 


3.3 The Cantor Topology on Spaces of Valuations 


As just noted in the previous section, one of the sources of motivation for 
studying topology in relation to logic programming is the role of convergence 
of sequences of iterates of the immediate consequence operator in relation 
to semantics and also, in fact, in relation to termination. We take this dis- 
cussion further now, but this time in the context of normal programs and the 
construction of certain standard models for them, and in more detail in Chap- 
ter 5. We also refer the reader to Chapter 5 for details of how convergence 
enters into questions concerned with the so-called acceptable programs and 
problems concerned with termination, see Corollary 5.2.5, Proposition 5.2.7, 
Theorem 5.2.8 and Theorem 5.4.14, for example. 

We begin with a result concerning product topologies. 

Let X and Y be arbitrary sets, and let [X — Y] denote the set of all total 
functions mapping X into Y. When Y is ordered, perhaps as a set of truth 
values 7, then so is [X — Y], and, as we have just seen, important topologies 
can be defined on [X — Y] by quite natural convergence conditions which 
make use of the order. However, important topologies can also be defined on 
[X — Y] using natural convergence conditions which do not depend on any 
order, as we show next.!? 


3.3.1 Theorem Let (s;) be a net in [X — Y], and let s € [X — Y]. Then 
the condition 


12 Again, see [Seda, 2002]. 
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lim; s; = s (C) if and only if for each x E€ X eventually s;(x) = s(x) 


determines a convergence class on [X — Y] whose associated topology Q is 
the product of X copies of the discrete topology on Y. 


Proof: We must verify that the conditions (1), (2), (3), and (4) in the defini- 
tion of a convergence class, see Definition 3.1.2, hold with the given meaning 

(1) Suppose that s; = s for all i € T is a constant net. Then s;(x) = s(x) 
for all z and all i. Hence, for all z, eventually s;(2) = s(x), and so ((s;), s) € C. 

(2) Suppose that ((s;),s) € C and that (t;);e7 is a subnet of (si)iez. Let 
x € X be arbitrary, and let iọ be such that s;(x) = s(x) for all i > to. Since 
(tj) is a subnet of (si), there is 6: J — T and jo E€ J such that io < d(J) 
whenever jo < j. But then, if jo < j, we have t;(x) = sọ) (x) = s(x), and 
hence ((t;),s) €C. 

(3) Suppose that (s;);ez does not converge (C) to s. Then there is z € X 
and a cofinal subset J of Z such that, whenever j € J, we have s;(x) # s(x). 
Let t; = s; for each j € J. Then (t;) is a subnet of (s;), and clearly no subnet 
of (t;) converges (C) to s. 

(4) Suppose that the conditions stated in (4) of Definition 3.1.2 all hold 
and that lim,, lim, z(m,n) = s (C), where x : F” — [X — Y]. Consider the 
net cor: F > [|X — Y]. Let y € X be arbitrary. Since lim,, lim, (m,n) = 
s (C), there is mo € I such that, for all m > mo, lim, (m,n) = sm (C) 
for some sm € [X — Y], and lim, Sm = s (C). Therefore, for m > mo, 
there is nm E€ Jm such that x(m,n)(y) = $m(y) for all n > nm. But for 
m > mo, Sm(y) = s(y). Define f € J[mer]m by setting f(m) = mm © Jm 
whenever m > mo and otherwise letting f(m) € Jm be arbitrary. Suppose 
(m,g) > (mo, f). Then m > mo and g > f so that g(m) > f(m) = nm. But 
then we have x(m, g(m))(y) = Sm(y) = s(y). In other words, (xor)(m, g)(y) = 
x(m, g(m))(y) = s(y) whenever (m, g) > (mo, f). Thus, (xor)(y) is eventually 
equal to s(y), and hence gx or converges (C) to s. 

Finally, viewing [X — Y] as the product |],,-.Yx2, where Y; = Y for each 
x € X, then, as is well-known, a net (s;) converges in such a product to s if and 
only if s;(@) — s(x) in Y for each x, see Theorem A.5.2 (e). But, given that 
Y is endowed with the discrete topology, this latter condition s;(x) —> s(x) 
holds if and only if s;(a) is eventually equal to s(x), as required. a 


Theorem 3.3.1 holds with X taken as Bp (= Bp, J), where P is a normal 
logic program, and Y taken as any set 7 of truth values, and in particular it 
holds with 7 taken as TWO. With these choices, we obtain the following re- 
sult, which is analogous to Proposition 3.2.10, but applies to normal programs 
in general. 


3.3.2 Proposition Let P be a normal logic program. Suppose that C is any 
convergence class on Ip2 whose elements satisfy the condition stated in The- 
orem 3.3.1: 
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if (i), T) E€ C, then, for each A € Bp, eventually I;(A) = I(A). 


Then, whenever M is an interpretation for P such that ((TR(M)), I) € C, we 
have that I is a model for P. 


Proof: By Proposition 2.2.2, it suffices to show that Tp(I) E I. So, sup- 
pose therefore that Tp(I)(A) = t. Then there is a ground instance A <— 
Aj,..-,An, 7B1,...,7Bm of a clause in P such that [(A1 A... ^A An A7Bi A 

. A 7Bm) = t. Taking the sequence Tp(M), we have, by the property 
stated in the hypothesis (applied to each literal in the conjunction under 
consideration), that eventually Tf(M)(A1 A... A An A7Bi A... A 7Bm) = 
T(AyA...N\ An AB A...\75By,) = t. Therefore, eventually T3(M)(A) =t, 
and, by the property stated in the hypothesis again, we obtain [(A) = t. 
Hence, whenever Tp(I)(A) = t, we have I(A) = t. Thus, Tp(I) E I, as 
required. | 


3.3.3 Remark (1) Theorem 3.3.1 shows that the largest convergence class 
C to which Proposition 3.3.2 applies is the convergence class C(Q) deter- 
mined by the topology Q. Therefore, Q is the coarsest topology among 
the topologies determined by those convergence classes to which Proposi- 
tion 3.3.2 can be applied. 


(2) In topological terms, Proposition 3.3.2 says that if M is an interpretation 
for a normal logic program P such that the sequence (7'73(M)) of iterates 
converges in the topology Q to some interpretation J for P, then I is a 
model for P. 


In fact, we note that the construction of the perfect model semantics for 
locally stratified programs P, which we give in Chapter 6, rests on the second 
of the facts stated in the previous remark. 

Notice that Proposition 3.3.2 holds in any convergence class contained in 
C(Q). In other words, it holds for any convergence class determined by a topol- 
ogy finer than Q. Furthermore, Q is not the only naturally definable topology 
determined by a convergence class for which Proposition 3.3.2 holds. For ex- 
ample, if we define lim; v; = v (C) to mean that eventually v; = v, we obtain 
another natural convergence class which trivially satisfies Proposition 3.3.2, 
and this convergence class generates the discrete topology on I(X,T). 

Next, we want to investigate the properties of I(X, T) when endowed with 
the topology Q, and indeed the representation of Q given in Theorem 3.3.1 as 
a product space makes this relatively easy. 


3.3.4 Theorem Let P be a normal logic program, let J be a preinterpreta- 
tion for P with domain D, let X = Bp J, and let T be a truth set endowed 
with the discrete topology. Then in the topology Q on I(X,T) we have the 
following results. 
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(a) A net (J;) of interpretations converges to an interpretation J if and only 
if, for each ground atom A, we have that [;(A) is eventually equal to I(A). 


(b) I(X,T) is a totally disconnected Hausdorff space. 

(c) I(X,T) is compact if and only if T is a finite set. 

(d) I(X,T) is metrizable!3 if and only if D is countable. 

(e) I(X,T) is second countable if and only if D and T are both countable. 


(£) Suppose that D is denumerable and that T is finite. Then I(X,T) is 
homeomorphic to the Cantor set in the closed unit interval within the 
real line. 


Proof: Statement (a) follows immediately from Theorem 3.3.1, and all the 
remaining statements follow from general and well-known results concern- 
ing product spaces, see the Appendix. Specifically, they can be found in 
(Willard, 1970], where unfamiliar terms are also defined, as follows: for (b), 
see Page 72, Theorem 13.8, and Page 210, Theorem 29.3; (c) follows from Ty- 
chonoff’s theorem (Page 120, Theorem 17.8) and the fact that a discrete space 
is compact if and only if it is finite; for (d), see Page 161, Theorem 22.3; for 
(e), see Page 108, Theorem 16.2; and finally, for (f), see Page 217, Corollary 
30.6. a 


Because of Part (f) of Theorem 3.3.4, we refer to the topology Q as the 
Cantor topology. 

Notice that I(X,T) is a Hausdorff space in the topology Q, and hence 
the limit of any net convergent in Q is unique, see Theorem A.4.2, unlike the 
situation in the Scott topology where a convergent net has many limits in 
general, as shown by Example 3.2.7. 

In the case of two-valued interpretations, we have the following result. It 
follows immediately from Part (a) of Theorem 3.3.4 and will be used quite 
often later on. 


3.3.5 Proposition A net (J;) of interpretations in Ip 2 converges to J in the 
topology Q if and only if whenever A € J, eventually A € J;, and whenever 
A ¢ I, eventually A ¢ I;. Moreover, the unique limit J coincides with the set 
{A € Bp | A eventually belongs to J;}. 


The following example illustrates Proposition 3.3.5. 


13Metrics are defined in Section 4.2 and studied extensively in Chapter 4. A topological 
space is said to be metrizable if its open sets can be defined in terms of some metric as 
discussed in Section 4.1. The representation of Q as a product topology makes it easy to 
determine metrics for Q, see [Seda, 1995]. 
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3.3.6 Example Consider again the program Even, see Program 2.1.3. To 
ease notation here, it will be convenient to denote this program by P and also 
to replace the predicate symbol even by p. Thus P denotes the program 


pla) — 
p(s(X)) — ap(X) 


We consider the iterates of Tp on the interpretation 0, as follows. 


Tp(0) =0 

Tp(0) = {p(a), p(s(a)), p(s?(a)), p(s?(a)), p(s*(a)), -- -} 
TØ) = {p(a)} 

Tp(0) = {p(a), p(s*(a)), p(s? (a)), p(s*(a)), p(s°(@)), ---} 
Tp(0) = {p(a), p(s*(a))} 

Tp(0) = {p(a), p(s*(a)), p(s*(a)), p(s? (a), p(s (a)), ---} 
Tp(0) = {p(a), v(s*(a)), p(s*(a))} 

Tp(0) = {p(a), p(s?(a)), p(s*(a)), p(s°(a)), p(s"(a)),---} 
Tp(0) = {p(a), p(s*(a)), v(s*(a)), p(s°(a))} 


and so on. On letting In denote 73() and also letting I denote the set 
{p(a), p(s?(a)), p(s*(a)),...} of “even” natural numbers, we note that the se- 
quence (In) oscillates quite wildly about I. Nevertheless, it is easy to see by 
means of Proposition 3.3.5 that (In) converges in Q to I. Therefore, by Re- 
mark 3.3.3, I is a model for P. Indeed, J is a fixed point of Tp and is the 
unique supported model for P. 

In fact, the oscillatory behaviour exhibited in this example in relation to 
the single-step operator is typical of programs containing negation. Indeed, 
for this example, Tp is not Scott continuous, and therefore Theorem 1.1.9 is 
not applicable to Tp. Hence, the theory developed for the semantics of definite 
programs in Chapter 2 is not applicable here either. 


3.3.7 Example It is immediate from (a) of Theorem 3.2.6 and Proposi- 
tion 3.3.5 that whenever a net (J;) converges to J in Q, then it converges 
to I in the Scott topology, and this is borne out by Example 3.2.8 and Corol- 
lary 3.3.10, just below, which show that the topology Q is finer than the Scott 
topology in the case of two-valued interpretations. 

On the other hand, the sequence (J,,) defined in the first paragraph of (3) 
of Example 3.2.7 converges in the Scott topology (to several interpretations), 
but does not converge (to anything) in Q. 


The point of view that the topology Q is appropriate for studying the 
semantics of logic programs with negation is given strong support by examples 
such as Example 3.3.6. It is given further support in the most usual case, where 
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the domain of interpretation is countable, as shown in the following example. 
In fact, in this next example, we show that a sequence (In) of two-valued 
interpretations converges in @ to a two-valued interpretation J if and only 
if the symmetric difference’4 [,,A I of the sets representing J, and I can be 
made arbitrarily small (in the sense described in Example 3.3.8), and this fact 
appears to be in accord with one’s intuition regarding negation. Indeed, the 
symmetric difference provides a simple metric for the topology Q, as we see 
next. 


3.3.8 Example Let P denote a normal logic program, and, to make the 
discussion non-trivial, suppose that the underlying first-order language £ of 
P contains at least one function symbol. Thus, Bp is denumerable, and we 
can suppose that the elements of Bp are given some fixed listing, so that 
Bp = (Aj, A2, A3,...), say. (In fact, the exact nature of Bp plays no role 
here, and we could work equally well over any preinterpretation J for £ whose 
domain is denumerable and can therefore be listed.) Now let d; be a real 
number satisfying 0 < d; < 1, for each i, and such that S>>°,d; = 1; each di 
is a weight to be attached to the element A; of Bp. Now define the metric d 


on Ip by 
LN 
d(I, T") = ean 


for I,I’ € Ip. Note that it is routine to check that d does indeed define a 
metric on Ip, and we show that d generates the topology Q on Ip. To do this, 
it suffices to show that an arbitrary sequence (J,,) converges to I, say, in Q if 
and only if it converges to I in the metric d. 

Suppose that (I„) is a sequence of interpretations in Ip, and I, — I in 
the metric d. Thus, d(I,, I) — 0 as n — oo. So, given € > 0, there is a natural 
number no such that whenever n > no we have d(I,,I) = DAERA d; < 
e. Suppose that A; € J. Choose e so small that € < dj, and obtain the 
corresponding no such that >> Acr, AI Ui < € Whenever n > no. Then obviously 
dj does not occur in this sum for any n > no. In other words, A; € I, N T for 
all n > no, and so A; is eventually in In. On the other hand, suppose that 
A; ¢ I. If A; belongs to infinitely many In, then DAERA di > d; infinitely 
often, contradicting d(I,, I) — 0. Thus, A; belongs to only finitely many In, 
and so A, is eventually not in J,,. Therefore, by Proposition 3.3.5, convergence 
in d implies convergence in Q. 

Conversely, suppose [,, — I in Q. Given e > 0, choose integers no so large 
that i js,,di < € and nọ > no so large that whenever n > no, InAI only 
contains elements A; with j > no or is empty (this situation can be achieved 
by finitely many applications of Proposition 3.3.5 since the set {Aj;j < no} 
is finite and, in fact, contains no — 1 elements). Then, whenever n > nh, we 


have 
d(In, I) = EuN di < onp di <E 


14We remind the reader that the symmetric difference of sets A and B is defined by 
AAB = (A\ B)U(B\ A). 
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and so I, — I in the metric d. Thus, d generates Q, as claimed. 
Furthermore, we note that, in particular, the weights d; can be taken to 
be + for each i, in which case the metric d takes the natural form 


1 
I) _ 
d(I, T) = DAET 2i? 


for I, T' € Ip. In any case, if I, — I in Q, then I,, — I in d, and hence, given 
any € > 0, there is no such that d(In, I) = Aer, Arti < € whenever n > no, 
and conversely. It is in this sense that the symmetric difference I, AI can be 
made arbitrarily small if [, > I in Q. a 


Because Q is a product topology, it is easy to describe the basic open sets 
of I(X,T) in Q as follows (the nature of X is actually irrelevant, although it is 
being taken here to be Bp,;). First, given any truth value t € T, the singleton 
set {t} is open in T, since T is endowed with the discrete topology. Therefore, 
see Section A.5, the basic open sets here are of the form Te (t;,)M.. ire (tin). 
They therefore can be written in the form G(A;,,..., initis sti) = {1 € 
I(X,T) | 1(A;,) = ti, for j = 1,...,n}, where Aj,,...,A;, are arbitrary, but 
fixed, elements of X. 

Thus, we have the following result, which describes Q in the familiar terms 
of basic open sets. 


in 


3.3.9 Proposition With the notation above, the basic open sets in the topol- 
ogy Q on the set I(X,T) take the form G(Aj,,..., Ainitis stin) = {1 € 


I(X,T) | I(4;,) = ti, for j =1,...,n}, where A;,,...,A;, are arbitrary, but 
fixed, elements of X and t;,,...,t;,, are arbitrary, but fixed, elements of 7 for 
j =1,...,n. Furthermore, the subbasic open sets in Q are those basic open 


sets G(A; t) determined by taking n = 1 in the set G(Aj,,..., Aini ti,,.--,ti,)- 


We denote by G the subbase for Q consisting of the sets G(A;t), where 
Ae XandteT. 

In particular, the previous proposition has the following corollary when T 
is the truth set TWO. 


3.3.10 Corollary When T is the truth set TWO, the basic open sets in Q 
take the form G(A),...,An;Bi,...,Bm) = {I € I(X,T) | A; € I, fori = 
1,...,n, and, for j =1,...,m, B; ¢ I}, where the A; and the B; are fixed, 
but arbitrary, elements of X, and n,m > 0. Furthermore, the subbasic open 
sets can be described similarly on taking n and m to be at most 1 in the set 
G(A1, oy ., An; Bı, a ., Bm). 


Finally, we close this section by noting that a natural question to consider is 
that of the continuity of the Tp operator relative to the topology Q. However, 
as already noted, we defer a discussion of this matter until Chapter 5, see 
Theorem 5.4.11, since we treat this question in more generality there in a 
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context within which it naturally arises; in particular, we provide necessary 
and sufficient conditions for the continuity of Tp in Q to hold. Some results are 
also known which ensure discontinuity of Tp, see [Seda, 1995], for example, 
and we pause briefly to consider an interesting example of this. 


3.3.11 Example Consider the program P consisting of the single clause 
p — —7q(X), whose underlying first-order language £ is assumed to contain a 
constant symbol o, a function symbol s, and predicate symbols r and t in addi- 
tion to the symbols present in P. For each binary sequence a = (dn)nen (of Os 
and 1s), we form the set Ag = {Ai, A2, A3,...}, where A; = r(s‘(o)) if a; = 0 
and A; = t(s*(o)) if a; = 1. Finally, let K, = {¢q(0), ¢(s(0)), ...,q(s”(0))} for 
each n € N. 

Then for each binary sequence a, the sequence of interpretations In = 
Aa U Kn converges in Q to the interpretation Ia = Aa U {g(s"(o)) | n € N} 
by Theorem 3.3.4. On the other hand, Tp(In) = {p}, whereas Tp(Ia) = 0. 
Hence, Tp(J,,) does not converge to Tp(J,) in Q, and so Tp is discontinuous 
at Iq. 

Since we have uncountably many binary sequences a, Tp has uncountably 
many points of discontinuity in Q. 


3.4 Operators on Spaces of Valuations Revisited 


Finally, we want to briefly return to the operators defined on I(X,T), 
which were discussed in Section 1.3.4, namely, the operators =, V, and ^. 
We have already noted in Section 1.3.4 that — is not order continuous and, 
hence, not Scott continuous relative to the orderings <+, in which f <, t. It 
is, however, Scott continuous in the orderings <;, as we now see. Of course, 
one can similarly deal with other connectives such as — and « in the same 
way. However, as we have seen earlier, these are usually made to depend on 
the three connectives we have already considered and therefore need not be 
pursued further. 

Our objective here is to examine the continuity of the operators =, V, and 
A relative to the Scott and Cantor topologies, and we first deal with the Scott 
topology. Again, we concentrate on the truth set FOU R for precisely the same 
reasons as stated in Section 1.3.4. 


3.4.1 Theorem Let 7 denote Belnap’s logic FOUR. Then the following 
statements hold. 


(a) The negation operator = : I(X,T) — I(X,T) is continuous in the Scott 
topology relative to the knowledge ordering Ex, but not relative to the 
truth ordering C+. The same statement is true in the case of Kleene’s 
strong three-valued logic. 
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(b) Take < to be either <, or <, on the logic FOUR. Form the domain 
I(X,T) with the corresponding pointwise order E and the corresponding 
product domain I(X,T) x I(X,T). Then both V and A are Scott continu- 
ous as mappings from I(X,T) x I(X,T) to I(X,T). The same statements 
are true in the case of classical two-valued logic (where the ordering has 
to be <;) and Kleene’s strong three-valued logic. 


Proof: For (a), the statements concerning the truth ordering E, have already 
been established. To deal with Ex, we use the criteria for convergence pre- 
sented in Theorem 3.2.6. Let (v;) be a net converging in the Scott topology to 
v in I(X,T). Suppose that x € (v)_. Then (~w) (x) = t, and hence x € ve. 
Since v; — v, we have that eventually x € v;,Uv;,, that is, eventually v;i(x) = f 
or u;(x) = b. But then eventually ~v; (x) = t or ~v;(x) = b, and so eventually 
x € (>v;)¢U(>v;)p. The other cases are handled similarly. Thus, the net (7v;) 
converges to ~w in the Scott topology, as required. 

For (b), we establish the result stated concerning V, noting that the proof 
for A^ is entirely similar. Now, as is well-known, it suffices to show continuity 
in each argument!” of V, and, by commutativity, it in fact suffices to show 
continuity in one argument, the first, say. So, fix v € I(X,7), and suppose 
that u; — u in the Scott topology on I(X, T). Let x € X be arbitrary. Then 
(u V v)(x) = u(x) V v(x). Since u; — u, we have eventually that u(x) < u;(x) 
by Theorem 3.2.6. Therefore, by Proposition 1.3.7, we have eventually that 
u(x) V v(x) < u(x) V v(x), and this suffices, by Theorem 3.2.6, to show that 
u; V v — u V v, as required. E 


We now turn our attention to these same operators in relation to the 
topology Q. Indeed, we close this chapter with the following result. 


3.4.2 Theorem Let 7 denote Belnap’s logic FOUR. Then the following 
statements hold. 


(a) The negation operator 4 : I(X,T) — I(X,T) is continuous in the topol- 
ogy Q. Hence, it is continuous in Q when 7 denotes either classical two- 
valued logic or Kleene’s strong three-valued logic. 


(b) Both V and A are continuous as mappings from I(X,T) x I(X,T) to 
I(X,T), where I(X,T) x I(X,T) is endowed with the product topology 
of Q with itself. Hence, the same result holds relative to either classical 
two-valued logic or Kleene’s strong three-valued logic. 


Proof: For (a), let (v;) be a net converging to v in I(X,T) relative to the 
topology Q, and let x € X be arbitrary. Then eventually v;(x) = v(x). There- 
fore, eventually (~v;) (x) = (>v)(x). Therefore, ~v; — ~v in Q, and the result 
follows. 

For (b), let (u;, vi) — (u,v) in the product topology. Then u; > u in Q 


15See Proposition 2.4 of [Stoltenberg-Hansen et al., 1994]. 
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and v; — v in Q. Let x € X be arbitrary. Then there exist 7; and i such 
that u;(x) = u(x) whenever i > i; and v;(x) = v(x) whenever i > i2. By 
directedness, there is i3 such that, for i > i3, we have both u;(x) = u(x) and 
u,(x) = v(x). Therefore, whenever i > i3, we have u;(x) V uj(x) = u(x) V v(x) 
and u;(x) Auj(x) = u(x) ^A v(x). Therefore, u; Vv; > u V v and u; ^vi > u ^v, 
as required. E 


There are several interesting topics relating to topology and logic pro- 
gramming semantics which are examined in the literature on the subject, 
but are not pursued here. These include, among other things, the consis- 
tency of program completions and of the union of program completions, see 
[Batarekh and Subrahmanian, 1989b]; compactness of spaces of models for a 
program; and continuity in Q of Tp for a normal program P at the point 
Tp | w and the coincidence of Tp | w with the greatest fixed point of Tp. For 
further discussion of all these points and others, see [Seda, 1995]. 

In conclusion, we note that order is a very satisfactory foundation for the 
semantics of procedural and imperative programming languages as exempli- 
fied through the denotational semantics approach to programming language 
theory. On the other hand, order is not an entirely satisfactory foundation 
for the semantics of logic programming languages in the presence of negation, 
and yet negation is a natural part of most logics. However, our treatment here 
and in later chapters shows that one can consider convergence instead as a 
foundation for a unified approach by which one can recover conventional order- 
theoretic semantics and at the same time display some important standard 
models in logic programming languages as limits of a sequence of iterates. 
In addition, convergence conditions involving nets arise very naturally in a 
number of areas within theoretical computer science and are simple to state 
and to comprehend. Moreover, nets usually give short and technically simple 
proofs, as demonstrated in several places in this chapter. 


Chapter 4 


Fixed-Point Theory for Generalized 
Metric Spaces 


In Chapters 1 and 2, we gave ample evidence of the fundamental role played 
by the Kleene and Knaster-Tarski fixed-point theorems, Theorems 1.1.9 and 
1.1.10, in logic programming semantics. Moreover, we have also seen that the 
operator Tp need not be monotonic for normal programs and, hence, that 
the theorems just cited are not generally applicable to Tp in this case. It is, 
therefore, of interest to consider possible alternatives to Theorems 1.1.9 and 
1.1.10, and in this chapter we discuss a number of such fixed-point theorems 
and some related results which will be put to use later on. 

Almost always, alternatives to the theorems of Kleene and Knaster-Tarski 
employ distance functions in their formulations and in their applications.! 
Logic programming is no exception to this rule, and we will consider a num- 
ber of ways in which distance functions can be naturally introduced into this 
subject along with appropriate fixed-point theorems. Part of this process con- 
sists of working with quite general distance functions, relaxing in one way 
or another the standard axioms for a metric, and establishing corresponding 
fixed-point theorems analogous to the Banach contraction mapping theorem. 
Nevertheless, the applications we make later and the examples we discuss 
show that these general distance functions do quite easily and naturally arise 
in logic programming, although applications will be deferred until Chapter 5. 
Indeed, Sections 4.1 to 4.7 in this chapter deal with the different generalized 
metrics and corresponding fixed-point theorems we develop for single-valued 
mappings, while in Section 4.8 we examine the interconnections between the 
spaces underlying the various distance functions we study and also discuss a 
number of relevant examples. In Sections 4.9 to 4.14, we consider the corre- 
sponding results for multivalued mappings. Hence, in summary, this chapter 
is a self-contained account of the pure metric fixed-point theory appropriate 
to logic programming and also provides the tools needed for the application of 
distance functions in developing a unified approach to the fixed-point theory 
of very general and significant classes of logic programs in Chapters 5 and 
6. In addition, the methods and results discussed in this chapter have poten- 
tial applications to a wider spectrum of topics in computer science than just 
simply logic programming, but none of these will be pursued here. 


lWe refer again to [Kirk and Sims, 2001] as an excellent source of information on fixed- 
point theory in general. 
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Remark We refer the reader to the paper [Seda and Hitzler, 2010] for a 
discussion of many recent and fairly recent applications of distance func- 
tions to various parts of computer science. The areas in question range from 
conventional semantics ({Arnold and Nivat, 1980b, Arnold and Nivat, 1980a, 
Bukatin and Scott, 1997, O’Neill, 1996, Smyth, 1992]) and the study of con- 
currency ({de Bakker and de Vink, 1996, Reed et al., 1991]) to domain theory 
([Kiinzi et al., 2006, Krétzsch, 2006, Martin, 2000, Waszkiewicz, 2003]) to in- 
formation theory, cognitive processes and unique fingerprinting of time series 
({[Albeverio et al., 1999, Khrennikov, 1998, Khrennikov, 2004, Murtagh, 2004, 
Murtagh, 2005]) to abstract interpretation ([Crazzolara, 1997]) to complex- 
ity and and its connections with semantics ([(Castro-Company et al., 2007, 
Romaguera and Schellekens, 2003, Rodriguez-Lépez et al., 2008]), to neural- 
symbolic integration ([Bader et al., 2006, Hitzler et al., 2004, Seda, 2006]), 
to measuring the distance between programs in software engineering 
([Bukatin, 2002, Seda and Lane, 2003]), through to bioinformatics and the 
properties of p-adic numbers of DNA sequences and degeneracy of genetic 
codes ([Dragovich and Dragovich, 2006, Khrennikov and Kozyrev, 2007]), and 
beyond. 


4.1 Distance Functions in General 


At a completely general level, a distance function d defined on a set X is 
simply a mapping d : X x X — A, where A is some suitable set of values 
(a distance set or value set), and the distance between x and y is taken to 
be the element d(x,y) of A. Second, and again at a completely general level, 
the related notion of closeness can be defined by assigning to each element 
x of a set X a family U, of subsets U of X; then y can be thought of as 
close to x if y belongs to some element U of Uz. These notions are somewhat 
dual to each other, even synonymous, as we shall see shortly. However, the 
present level of generality is too high to be useful, and therefore we will impose 
a variety of restrictions as we proceed.” In fact, it is our intention to begin 
by briefly considering a uniform, conceptual framework, namely, continuity 
spaces,’ within which all the particular distance functions we encounter can be 
described. Indeed, this framework is such that the notions of distance function 
and closeness are actually dual to each other when the set Uy is taken, for 
each x € X, to be the neighbourhood base of x, as defined in the Appendix, 


2[Waszkiewicz, 2002] contains a very general study of spaces based on the notion of 
distance function. 

3Our treatment of continuity spaces follows [Kopperman, 1988] closely. We refer also 
to [Flagg and Kopperman, 1997] and related papers, where the notion of continuity space 
has been developed further in a number of directions, and to [Kiinzi, 2001] for further 
background. 


Fixed-Point Theory for Generalized Metric Spaces 89 


see Theorem A.2.5 in particular. This last observation connects topology and 
distance in full generality, and this setting, while not the most general to 
have been found to be of interest in computer science, as already noted in 
Chapter 3, is sufficient for our purposes here. In fact, we shall make no actual 
use of continuity spaces and present them purely as a framework within which 
to work. However, continuity spaces do provide a smooth transition from the 
topology presented in Chapter 3 to the work of this chapter, and indeed they 
bridge the two chapters. 

Before turning to the details of continuity spaces in general, it will be worth 
considering first the familiar case of distance functions d which are metrics, 
see Definition 4.2.1 and Remark 4.2.2. In this case, the usual value set A of d is 
the interval [0, 00). Given some real number € > 0, one defines the (open) ball 
N:(x) of radius © about a point x € X by setting Ne(x) = {y € X | d(a,y) < 
E€}. A subset O of X is then declared to be open if, for each x € X, there 
is some € > 0 such that N(x) C O. It is easy to see that the collection of 
such open sets O forms a topology on X. Notice that in defining “open” sets 
O here, one can equivalently require B(x) C O for suitable e’ > 0, where 
B:(x) = {y € X | d(x,y) < e} denotes the (closed) ball of radius € about a 
point xe X. 

However, it is not true that every topology on X arises thus via a met- 
ric d, and, for example, this statement applies to the Scott topology since 
this topology in not even T; in general, see Proposition A.6.5, whereas every 
metrizable topology is Hausdorff. Nevertheless, every topology can be gener- 
ated by means of a suitable distance function, as already noted, and we next 
consider briefly the details of one way of establishing this claim, beginning 
with several definitions. 


4.1.1 Definition A semigroup is a set A together with an (additive) asso- 
ciative binary operation + : A x A — A. If + is also commutative, then the 
semigroup is called commutative or Abelian. A semigroup A is called a semi- 
group with identity if there exists an element 0 € A, called the identity, such 
that 0+a = a+0 = a for all a € A. We note that an (additive) Abelian semi- 
group with identity is also called a commutative monoid or Abelian monoid. 

By an ordered semigroup with identity we mean a semigroup A with 0, say, 
on which there is defined an ordering < satisfying: 0 < a for all a € A, and if 
a, < az and a, < a4, then a, + a < az + ah for all ay, a4, a2, a4 € A. 


4.1.2 Definition A value semigroup A is an additive Abelian semigroup with 
identity 0 and absorbing element co,’ where co Æ 0, satisfying the following 
axioms. 


(1) For all a,b € A, if a + x = b and b + y = a for some x,y € A, then a = b. 
(Note that, using this property, we can define a partial order < on A by 
setting a < b if and only if b = a + for some x € A; we call < the partial 


1An element satisfying a + 0o = co + a = œ for all a € A. 
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order induced on A by the operation +. It is immediate that A equipped 
with this partial order is an ordered semigroup, as just defined.) 


(2) For each a € A, there is a unique b (= $) € A such that b+ b= a. 


(3) For all a,b € A, the infimum a A b of a and b exists in A relative to the 
partial order < defined in (1). 


(4) For all a,b,c € A, (a@Ab) + c= (atc) A (b+ Cc). 
Note that if {(A;, +:,0;, 00:4) | i € T} is a family of value semigroups, then 


so is their product (A, +, 0, 00), where +, 0, and oo are defined coordinatewise. 


4.1.3 Definition A set P of positives in a value semigroup A is a subset P 
of A satisfying the following axioms. 


1) If r,s € P, then rAse P. 
2) Ifre P andr <a, thena €P. 


( 
( 
(3) If r € P, then 5 € P. 

(4) Ifa<b+r for all r € P, thena < b. 


4.1.4 Example The set R of extended real numbers [0, co] together with 
addition forms a value semigroup, the set (0, co] is a set of positives for this 
example, and the induced partial order < is the usual one on R. 


4.1.5 Definition A continuity space is a quadruple ¥ = (X,d, A, P), where 
X is a non-empty set, A is a value semigroup, P is a set of positives in A, 
and d: X x X — A is a function, called a continuity function, satisfying the 
following axioms. 


(1) For all x € X, d(x,x) = 0. 
(2) For all x,y,z € X, d(x,z) < d(x,y) + d(y, z). 

Finally, we define the topology generated by a continuity space. 
4.1.6 Definition Suppose that ¥ = (X,d, A, P) is a continuity space. Let 
x € X, and let b € P. Then B(x) = {y € X | d(x,y) < b} is called the ball 
of radius b about x. The topology T (X) generated by X consists of all those 


subsets O of X satisfying the property: if x € O, then B,(x) C O for some 
be P. 


The main result concerning continuity spaces is the following theorem due 
to R. Kopperman [Kopperman, 1988]. 
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4.1.7 Theorem Given a continuity space ¥ = (X,d, A, P), the collection 
T(X) of subsets of X is a topology on X. Conversely, given a topology T on 
a set X, there is a continuity space ¥ = (X,d, A, P) with the property that 
T=T(AX). 


Given a topology T on X, it is worth noting that the continuity space 
X = (X,d,A,P) with the property that T = T(X¥) used in the proof of 
Theorem 4.1.7 is obtained, see [Kopperman, 1988], by taking A to be the 
product of T copies of R and P to be the product of T copies of (0, oo]. The 
continuity function d is defined coordinatewise by d(x, y)(S) = ds(x,y) for 
each S € T, where ds(x,y) = 0 if (x € S implies y € S), and ds(z,y) = q 
otherwise, where q is an element of (0, 00] fixed once and for all. 


4.2 Metrics and Their Generalizations 


As already noted, it is our intention, with applications in mind, to choose 
suitable value sets for distance functions and to impose various useful condi- 
tions on the distance functions themselves. We begin by considering the most 
familiar of these, where the value set is taken to be the set of non-negative 
real numbers. 


4.2.1 Definition Let X be a set, and let 9: X x X — R¢ be a distance 
function, where Rẹ denotes the set of non-negative real numbers. We consider 
the following conditions on o. 

M1) For all z € X, o(a,x) = 0. 


M2) For all z,y € X, if o(z,y) = oly, x) = 0, then x = y. 


M4) For all x,y,z € X, o(a,y) < olx, z) + o(z,y). 


(M1) 
(M2) 
(M3) For all x,y € X, o(x,y) = oy, 2). 
(M4) 
(M5) 


M5) For all x,y,z € X, o(x,y) < max{o(z, z), o(z,y)}- 


If o satisfies conditions (M1) to (M4), it is called a metric and is called an ultra- 
metric if it also satisfies (M5).° If it satisfies conditions (M1), (M3), and (M4), 
it is called a pseudometric. If it satisfies (M2), (M3), and (M4), we will call 
it a dislocated metric (or simply a d-metric). Finally, if it satisfies conditions 
(M1), (M2), and (M4), it is called a quasimetric. Condition (M4) is usually 


5For elementary properties and notions relating to conventional metrics, such as Cauchy 
sequences and completeness, we refer to [Willard, 1970]; these notions will, in any case, be 
defined later in this chapter in greater generality. 
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TABLE 4.1: Generalized metrics: Definition 4.2.1. 


notion satisfies (M1) (M2) (M3) (M4) (M5) 
metric x x x x 
ultrametric x x x (x) x 
pseudometric x x x 
pseudo-ultrametric x x (x) x 
quasimetric x x x 
quasi-ultrametric x x (x) x 
dislocated metric x x x 
dislocated ultrametric x (x) x 
dislocated quasimetric x x 
dislocated quasi-ultrametric x (x) x 
quasi-pseudometric x x 
quasi-pseudo-ultrametric x (x) x 


called the triangle inequality. Furthermore, if a (pseudo, quasi, d-)metric sat- 
isfies the strong triangle inequality (M5), then it is called a (pseudo-, quasi-, 
d-) ultrametric. These notions are displayed in Table 4.1, where the symbol 
x indicates that the respective condition is satisfied and the symbol (x) in- 
dicates that the respective condition is automatically satisfied; for example, 
since the condition (M5) implies (M4), any distance function satisfying (M5) 
automatically satisfies (M4). 


Note that one can take the codomain of @ to be [0, co] in Definition 4.2.1 
rather than RE . We note then that all the distance functions just considered 
in Definition 4.2.1, apart from dislocated metrics, are continuity functions, 
as is easily checked. However, even dislocated metrics give rise to topologies, 
and essentially the same correspondence between them and topologies holds 
between continuity spaces and topologies, as we see later. Indeed, each d- 
metric gives rise to its associated metric, see Definition 4.8.9, and each d- 
generalized ultrametric gives rise to its associated generalized ultrametric, see 
Definition 4.8.19. 


4.2.2 Remark As far as notation for distance functions is concerned, we will, 
generally, although not rigidly, use d and occasionally \ to denote metrics, 
ultrametrics, pseudometrics, and quasimetrics, all as just defined; we will use 
o and occasionally p to denote d-metrics, to denote generalized ultrametrics 
as introduced in Section 4.3, and to denote the extensions of these notions 
studied in Section 4.4 and beyond. This convention will be employed both 
in the context of single-valued mappings and in the context of multivalued 


®See [Hitzler and Seda, 2000] for full details of the topology determined by a d-metric. 
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mappings and is intended to help the reader to remember the nature of the 
distance function under consideration at any given time. The one exception to 
this occurs in Section 4.8.4, where we encounter two generalized ultrametrics 
the second of which is derived from the first. In this instance, we retain the 
notation o for the first of these generalized ultrametrics and d for the second; 
essentially, the same comment applies to Section 5.1, where the results of 
Section 4.8.4 are applied. 


The most widely used of the distance functions just defined is that of 
metric, and to that extent we regard metric distance functions as basic and 
think of departures from them as variants. 

The following well-known theorem, usually referred to as the Banach con- 
traction mapping theorem, is fundamental in many areas of mathematics. It 
is prototypical of a large number of extensions and refinements, including all 
those we discuss in this chapter. We give the well-known proof in detail for 
later reference. 


4.2.3 Theorem (Banach) Let (X,d) be a complete metric space, let 0 < 
A <1, and let f : X — X bea contraction with contractivity factor À, that 
is, f is a (single-valued) function satisfying d( f(x), f(y)) < Ad(x,y) for all 
x,y E€ X with x Æ y. Then f has a unique fixed point, which can be obtained 
as the limit of the sequence (f"(y)) for any y € X. 


Proof: The proof consists of the following three steps. It is shown that (1) 
(f"(Y))n>o is a Cauchy sequence for all y € X, (2) the limit of this Cauchy 
sequence is a fixed point of f, and (3) this fixed point is unique. 

(1) Let m,n € N, suppose that m > n, and put k = m—n. Then we obtain 


af" (y), Fy) =a LEO Aad i A )) 
< wal d(f y), Fy) ) < Ar 3 Nd( (y, f 


= dA" d(y, f(y nx Hala Jlo) ox 


- Een 


The latter term converges to 0 as n — oo, and this establishes (1). 
(2) Now X is complete, and so (f"(y))n>o has a limit x. Thus, we obtain 


f(x) = f(lim f"(y)) = lim f" (y) = z 


by continuity of f. Therefore, x is a fixed point of f. 

(3) Assume now that z is also a fixed point of f. Then d(x,z) = 
d(f(x), f(2)) < Ad(z,z). Since X < 1, we obtain d(x,z) = 0, and hence, 
by (M2), we have x = z, as required. E 
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Notice that the condition x Æ y is not actually needed in the statement of 
the previous result, but is included for the sake of consistency with what we 
want to say next, namely, that it is well-known’ that the requirement À < 1 
cannot be relaxed in general. This can be seen by considering the function 
f : R — R defined by 


c++ forx>1, 
x)= F a 
fl) {3 otherwise. 
This function satisfies the condition d( f(x), f(y)) < d(x,y) for allz,y € R 
with x Æ y, where d is the usual metric on R, but has no fixed point since 
f(x) > x for all x € R. If X is compact, however, the requirement on À can 
be relaxed. 


4.2.4 Theorem Let (X,d) be a compact metric space, and let f : X > X 
be a function which is strictly contracting, that is, f satisfies d( f(x), f(y)) < 
d(x,y) for all x,y € X with x Æ y. Then f has a unique fixed point. 


Proof: The function d(x) = d(x, f(x)) is continuous since f is continuous. 
It therefore achieves a minimum m on X. Assume d(zo) = m > 0. Then 
d(f(xo)) = d(f (xo), f(f(a0))) < d(ao, f(£0)) = d(ao) = m, which is a contra- 
diction. Hence, m = 0, and so f has a fixed point. 

Assume x and y are fixed points of f and x # y. Then d(z,y) = 
d( f(x), f(y)) < d(x,y), which is a contradiction. Therefore, the fixed point 
of f is unique. a 


There is quite a lot of interest in establishing results which can be viewed 
in one way or another as converses of the Banach theorem.’ The following 
is such a result. It was originally inspired by certain applications to logic 
programming, to be given in Chapters 5 and 6, of the results presented in this 
chapter. 


4.2.5 Theorem Let (X,7) be a Tı topological space, and let f : X —> X 
be a function which has a unique fixed point a and is such that, for each 
x € X, the sequence (f”(x)) converges to a in 7. Then there exists a function 
d: X xX — R such that (X, d) is a complete ultrametric space and such that 
for all x,y € X we have d(f(x), f(y)) < $d(a,y). 


Proof: The proof is divided into several steps, numbered consecutively. 

(1) Given x € X, we define the set T(x) C X to be the smallest subset of 
X which is closed under the following rules. 

(1.1) x € T(x). 


TThe results of Section 4.2 can be found in many places including [Kirk and Sims, 2001, 
Dugundji and Granas, 1982], for example. 

8A discussion of this question can be found in [Kirk and Sims, 2001] and its references 
and in [Istratescu, 1981]. 
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(1.2) If y € T(x) and f(y) #a, then f(y) € T(x). 

(1.3) If y € T(x) and y £ a, then f~'(y) C T(z). 

It is clear that the intersection of the family of all sets closed under these rules 
is itself closed under these rules, and hence T(x) exists. Moreover, it is also 
clear that each of the sets T(x) is non-empty. Now let T = {T(x) | x € X}, 
and observe the following facts. 

(i) T(a) = {a}. To see this, we note that (1.1), (1.2), and (1.3) are all true 
relative to the set {a}. Therefore, by minimality, we have T(a) = {a}. 

(ii) If x 4 a, then a ¢ T(x), and so T(a)NT(z) = 9. Hence, either T (a) and 
T(x) are equal or they are disjoint. To see this, suppose x 4 a, and consider 
rule (1.3). Clearly, we cannot have a € f~+(a); otherwise, f(a) = x, and hence 
a = x, which is a contradiction. Thus, rules (1.2) and (1.3) applied repeatedly 
and starting with x never place a in T(x), and, by minimality, the process 
just described generates T(x). 

(iii) If T(x) # T(a) and T(y) # T(a), then either T(x) and T(y) are equal 
or they are disjoint. To see this, suppose z € T (a) T(y). Then the rules (1.1), 
(1.2), and (1.3) under repeated application starting with z force T(x) = T(y). 
Thus, the collection 7 is a partition of X. 

(2) We next inductively define a mapping l : T — ZU {oo} on each T € T. 

(2.1) We set l(a) = oo, and this defines l on T = T(a). If T 4 T(a), we 
choose an arbitrary x € T and set I(x) = 0 (of course, x 4 a) and proceed as 
follows. 

(2.2) For each y € T with f(y) 4a and I(y) =k, let (f(y) =kK+1. 
(2.3) For each y € T with I(y) = k, let I(z) = k — 1 for all z € f~1(y). 
We will henceforth assume that all this is done for every T € T so that lis a 
function defined on all of X. It is clear that the mapping l is well-defined since 
(X,T) is a T; space.? For, if there is a cycle in the sequence f” (x) of iterates 
for some x € X, then we can arrange for some element y in this sequence to 
be frequently not in some neighbourhood of a, using the fact that X is Tı, 
which contradicts the convergence of the sequence f”(x) to a. 

(3) Define a mapping + : ZU {oo} — R by 


w= if k = œ, 


2-* otherwise. 
Furthermore, define a mapping 6: X x X — R by 


d(x, y) = max{u(I(x)), (Uy) } 
and a mapping d: X x X — R by 


d(z,y) ifa Ay, 
den= fA Cee 


9We can weaken the requirement of r being T; by replacing it with the following condi- 
tion: for every y € X there exists an open neighbourhood U of a with y ¢ U. 
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(4) We show that (X, d) is an ultrametric space. (M1) Let d(x, y) = 0, and 
assume that x 4 y. Then we have (x,y) = d(x,y) = 0. Therefore, we obtain 
max{v(1(x)),e(l(y))} = 0, so i(l(x)) = (l(y)) = 0. Hence, I(x) = I(y) = œ 
and x = y = a by construction of l, which is a contradiction. 

(M2) This is true by definition of d. 

(M3) This is true by symmetry of 6 and, hence, of d. 

M5) Let z,y,z E€ X. Assume without loss of generality that (l(x)) < 
(z)) so that d(x,z) = e(I(z)). If o(y)) < c(U(z)), then d(y, z) = v(I(z)). 
ul(y)) > e(l(z)), then d(y,z) = c(l(y)) > (l(z)). In both cases we get 
d(y,z) > d(x,z), as required. 

(5) (X, d) is complete as a metric space. In order to show this, let (x,,) be 
a Cauchy sequence in X. If (x,,) is eventually constant, then it converges triv- 
ially. So now assume that (xn) is not eventually constant. We proceed to show 
that x, converges to a in d, for which it suffices to show that (c(I(an)))nen 
converges to 0. Let £ > 0. Then there exists ng € N such that for all m,n > no 
we have d(£m, £n) < €. In particular, we have d(£m, £no) < € for all m > no, 
and, since (xn) is not eventually constant, we thus obtain t(l(£no)) < € and 
also t(I(am)) < £ for all m > no. Since € was chosen arbitrarily, we see that 
(t(U(an)))nen converges to 0. 

(6) We note that for f(a) 4 a, we have I(f(x)) = U(x) + 1 by definition of 
l, and hence u(1(f(x))) = $e(I(x)). 

(7) For all a,y € X, we have that d(f(x), f(y)) < $d(x,y). In order 
to establish this claim, let x,y € X, and assume without loss of generality 
that x # y. Now let d(x,y) = 27¥, say, so that max{z(I(x)), e(U(y))} = 27°. 
Then d(f(x), f(y) = max{o(l(F(a))), (EU) = t maxilla), e(U(y))} = 
sd(x,y), as required. E 


It should be noted that Theorem 4.2.5 is not a true converse of the Banach 
theorem in that we do not start out with a metrizable space and attempt to 
obtain a metric for it relative to which f is a contraction. Thus, Theorem 4.2.5 
is quite different from those discussed, for example, in Section 3.6 of the text 
[Istrățescu, 1981], in which a number of converses of the Banach theorem are 
considered. Even the result of Bessaga discussed there, which applies to an 
abstract set, is very different from ours in that we do not require all iterations 
of f to have a unique fixed point, but we do require topological convergence of 
the iterates of any point. Indeed, we can only make the following observations 
on the relationship between the original topology and the one created by the 
metric constructed in the proof of Theorem 4.2.5. 


4.2.6 Proposition With the notation of the proof of Theorem 4.2.5, the 
following hold. 


(a) Any « Æ a is an isolated point with respect to d, that is, {x} is open and 
closed in the topology generated by d. 


(b) If (£n) is a sequence in X which converges in d to some x Æ a, then the 
sequence (£n) is eventually constant. 
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(c) The metric d does not in general generate T, but the iterates (f"(x)) of 
f converge to a both with respect to 7 and with respect to d. 


Proof: (a) Let x # a, and let i(I(x)) = 2~*, say. Then, for any y € X, we have 
5(x,y) > 27}, and hence, for each y Æ x, we have d(x,y) > 2~*. Therefore, 
{y € X | d(x,y) < 278} = {x}, which is consequently open in d. Closedness 
is trivial. 

(b) In order to see this, it suffices to show that {x} is open with respect to 
d for any x # a, which is true by (i) in Step (1) of the proof of Theorem 4.2.5. 

(c) Indeed, the topology T is not in general metrizable. By the proof of 
the Banach contraction mapping theorem, (f”(x)) converges to a with respect 
to d. Convergence with respect to T follows from the hypothesis of Theorem 
4.2.5. | 


4.3 Generalized Ultrametrics 


The first generalization of the standard notion of metric which we consider 
is actually obtained from Definition 4.2.1 by replacing the codomain of @ 
(the value set of o), namely, the set Rọ of non-negative real numbers, by 
an arbitrary partially ordered set rather than by relaxing any axioms. This 
leads to the notion of “generalized ultrametric” found in parts of algebra 
such as valuation theory and first applied to logic programming semantics 
by Prie8-Crampe and Ribenboim. Indeed, the main theorem of this section, 
Theorem 4.3.6, is due to Prie8-Crampe and Ribenboim.!? 


4.3.1 Definition Let X be a set, and let I be a partially ordered set with 
least element 0. We call (X, o,T), or simply (X, 0), a generalized ultrametric 
space (gum) if 9: X x X > T isa function such that the following statements 
hold for all x,y,z € X and all y ET. 


(U1) o(x,x) =0. 

(U2) If o(x,y) = 0, then x = y. 
(U3) e(z, y) = ely, 2). 

(U4) If o(x,z) < y and o(z,y) < y, then olz, y) < 7. 

If @ satisfies conditions (U2), (U3), and (U4), but not necessarily (U1), we 
call (X, o) a dislocated generalized ultrametric space or simply a d-gum space, 


10The material contained in Section 4.3 up to Theorem 4.3.6 can be found in the following 
three papers: [PrieS-Crampe and Ribenboim, 1993, PrieB-Crampe and Ribenboim, 2000a, 
PrieB-Crampe and Ribenboim, 2000c]. 
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TABLE 4.2: (Dislocated) generalized ultrametrics: Definition 4.3.1. 


notion satisfies (U1) (U2) (U3) (U4) 
generalized ultrametric (gum) x x x x 
dislocated generalized ultrametric (d-gum) x x x 


see Table 4.2. Condition (U4) will be called the strong triangle inequality for 
gums. We note that any gum is a d-gum. 


4.3.2 Remark It is clear that every ultrametric space is also a generalized 
ultrametric space. However, at the level of generality of the previous definition, 
the function o this time is not a continuity function, that is, T need not be 
a value semigroup. However, in the applications we will actually consider, T 
will be a value semigroup, and ọ will indeed be a continuity function, and we 
consider this point next. 

Let y > 0 denote an arbitrary ordinal, and denote by r} the set {27° | 
a < y} of symbols 27%. Then Ty is totally ordered by 27% < 27° if and only 
if 6 < a. Notice that I’, is really nothing other than y endowed with the dual 
of the usual ordering on ordinals, but it is convenient to use the symbols 27% 
rather than the symbols a to denote typical elements, as will be seen later in 
Section 4.8.2 and beyond. Notice also, as is commonly done, that we view an 
ordinal y as the set of all ordinals n such that n € y, that is, as the set of 
ordinals n such that n < y. Finally, we define the binary operation + on Ty 
by 

27% + 27° = max{27%, 27°} 


noting that 27° is an absorbing element for this operation. In particular, 
applying this construction to the ordinal y + 1, we note that 277 is both the 
bottom element of T441 and the identity element for the operation + defined 
on T441. Furthermore, 277 # 27° since y > 0, where 0 denotes the finite 
limit ordinal zero, and we note that we will sometimes also use 0 to denote 
277 where this does not cause confusion. Then I’,+1 is a value semigroup in 
which 5 = a, where a = 2~% denotes a typical element of [,41, and moreover, 
the partial order induced on I'y4; by + coincides with that already defined. 
Furthermore, the set {27% | a < y} is a set of positives in r441. It is the case 
T = I',41 which is of most interest to us. Therefore, in these cases of most 
interest, (X,0,I) is a continuity space. In fact, we shall take these points 
further later on in this chapter by turning a domain (D,E) into a generalized 
ultrametric space, see Sections 4.8.2 and 4.8.3 (and also Section 5.1.1). 


The following definitions prepare the way for the main result of this sec- 
tion, namely, Theorem 4.3.6, which provides the main fixed-point theorem 
applicable to gums. We note that the requisite form of completeness here is 


Fixed-Point Theory for Generalized Metric Spaces 99 


that of spherical completeness, defined next, and that the next two definitions 
and the following lemma apply to gums as a special case of d-gums. 


4.3.3 Definition Let (X,o0,I) be a d-gum space. For 0 # y ET and z € X, 
the set B,(x) = {y € X | o(x,y) < y} is called a (y-)ball in X with centre or 
midpoint x. A d-gum space is called spherically complete if, for any chain C, 
with respect to set-inclusion, of non-empty balls in X we have NC # 0. 


The stipulation in the definition of spherical completeness that all balls be 
non-empty can be dropped when working in a gum rather than in a d-gum, 
since in the former case all balls are clearly non-empty. 


4.3.4 Definition Let (X,o,I) be a d-gum space, and let f : X — X bea 
function. 


(1) f is called non-expanding if o( f(x), f(y)) < olx, y) for all z,y E€ X. 


(2) f is called strictly contracting on orbits!! if o(f°(x), f(x)) < o(f(x),2) 
for every x € X with « F f(z). 


(3) f is called strictly contracting (on X) if o( f(x), f(y)) < o(x,y) for all 
x,y E€ X with z £ y. 


We will need the following observations, which are well-known for ordinary 
ultrametric spaces. 


4.3.5 Lemma Let (X,0,I) be a d-gum space. For a, 86 €T and x,y € X, 
the following statements hold. 


(a) If œa < Gand Balx) N Bg(y) # 0, then Balx) C Ba(y). 


(b) If Balx) N Baly) # 9, then Balx) = Baly). In particular, each element 
of a ball is also its centre. 


(c) Bolz,y) (x) = Boxy) (Y). 


Proof: Let a € Ba(x), and let b € Balx) N Ba(y). Then o(a,x) < a and 
o(b,x) < a; hence, o(a,b) < a < B. Since o(b,y) < 8, we have ola, y) < 8 
and, hence, a € Bg(y), and this proves the first statement. The second follows 
by symmetry and the third by replacing o(x, y) by a and applying (b). E 


The following theorem is the analogue of the Banach contraction mapping 
theorem applicable to generalized ultrametrics.'* It will be proved later by 
virtue of proving the more general Theorem 4.5.1. 


11 An orbit of f is a subset of X of the form {f"(zx) | n € N} for some x € X. 
!2Theorem 4.3.6 can be found in [Prie8-Crampe and Ribenboim, 2000c]. An earlier and 
less general version appeared in [PrieB-Crampe, 1990]. 
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4.3.6 Theorem (PrieB-Crampe and Ribenboim) Let (X,0,T) be a 
spherically complete generalized ultrametric space, and let f : X — X be 
non-expanding and strictly contracting on orbits. Then f has a fixed point. 
Moreover, if f is strictly contracting on X, then f has a unique fixed point. 


Note that every compact ultrametric space is spherically complete by the 
finite intersection property. The converse is not true: let X be an infinite set, 
and let d be the ultrametric defined by setting d(x,y) = 1 if x # y and 
taking d(x, x) = 0 for all x € X. Then (X, d) is not compact but is spherically 
complete. 

The relationship between spherical completeness and completeness is given 
by the next proposition. !% 


4.3.7 Proposition Let (X,d) be an ultrametric space. If X is spherically 
complete, then it is complete. The converse does not hold in general. 


Proof: Assume that (X, d) is spherically complete and that (£n) is a Cauchy 
sequence in (X,d). Then, for every k € N, there exists a least ng € N such 
that for all n,m > nz we have d(an,t%m) < i We note that nz increases 


with k. Now consider the set of balls B = {B (n, ) | kE N}. By (U4), B 


is a decreasing chain of balls and has non-empty intersection B by spherical 
completeness of (X,d). Let a € B. Then it is easy to see that (£n) converges 
to a. Hence, B = {a} is a one-point set since limits in (X,d) are unique. 
Therefore, (X, d) is complete. 

In order to show that the converse does not hold in general, define an 
ultrametric d on N as follows. For n,m € N, let d(n,m) = 1 + 27 mintrunt 
if n Æ m, and set d(n,n) = 0 for all n € N. The topology induced by d is 
the discrete topology on N, and the Cauchy sequences with respect to d are 
exactly the sequences which are eventually constant; hence, (N, d) is complete. 
Now consider the chain of balls B, of the form {m € N | d(m,n) <1+27"}. 
Then we obtain Bn = {m | m > n} for all n € N. Hence, N Bn = 0. E 


Note also that, with the notation from the second part of the proof, the 
successor function n +> n + 1 is strictly contracting, but does not have a fixed 
point. By Proposition 4.3.7 and the remarks preceding it, we see that the 
notion of spherical completeness is strictly less general than completeness and 
is strictly more general than compactness. 

Spherical completeness can also be characterized by means of transfinite 
sequences, and we consider this next.!4 


13Similar studies of this issue have been undertaken in [Prieß-Crampe, 1990] in the case of 
totally ordered distance sets. The topology of generalized ultrametric spaces is investigated 
in [Heckmanns, 1996}. 

l4Here, we follow a line of thought developed in [Prie8-Crampe, 1990], only slightly 
changed (the original version was established under the assumption that the distance sets 
in question were linearly ordered) and with the proofs adapted to the more general setting. 
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4.3.8 Definition Let (x5)5<, be a (possibly transfinite) sequence of elements 
of a gum (X, o, T). Then (xs) is said to be pseudo-convergent if, for alla < 6 < 
y < n, we have o(%g, £y) < o(£a, vg). The transfinite sequence (75)541<, with 
Tg = 0(X%5,X541) is then strictly monotonic decreasing. If 7 is a limit ordinal, 
then any x € X with o(x, xs) < 75 for all 6 < 77 is called a pseudo-limit of the 
transfinite sequence (25) 5<- 

The space (X, o,T) is called trans-complete if every pseudo-convergent 
transfinite sequence (25)s<n, where 7 is a limit ordinal, has a pseudo-limit in 
X. 


4.3.9 Proposition Suppose that x is a pseudo-limit of (v5)5<n, where 77 is a 
limit ordinal. Then the set of all pseudo-limits of (xs) is given by Lim(x5) = 
{z E€ X | olx, z) < m5 for all 6 < n}. 


Proof: Let z € Lim(xs). Since o(z,x) < ms and oļ(x,z5) < m5, we obtain 
o(z,xs) < Ts for all 6, and hence z is a pseudo-limit. Conversely, let z be 
a pseudo-limit of (x5). Since o(x, £s+1), 0(2, £541) < M541 for all ô < n, we 


obtain o(x,z) < ms+1 < Ts for all 6 < ņ, as required. E 


4.3.10 Proposition A generalized ultrametric space is spherically complete 
if and only if it is trans-complete. 


Proof: Let X be trans-complete, and let B be a decreasing chain of balls in X. 
Without loss of generality, assume that 6 does not have a minimal element and 
is, in fact, strictly decreasing. Then we can select a coinitial subchain (Bs)s<n 
of B, where 7 is a limit ordinal, so that (Bs)s<, is a transfinite sequence of 
balls. Since this transfinite sequence is strictly decreasing, we know that for 
every 6 there exists xs € Bs \ Bs41, and the transfinite sequence (25)5<y is 
pseudo-convergent; hence, it has a pseudo-limit x. Since o(x, x5) < 0(%5, 2541) 
and 25,2541 E Bs, we obtain x € Bs for all 6, and therefore, x € (|B. 
Conversely, let X be spherically complete, and let (x5) be pseudo- 
convergent. Let ms = 0(%5,%541), and let Bs = B,,(a5). For a < 8, we 
have that zg € By N Bg, and therefore (Bs) is a decreasing chain of balls by 
Lemma 4.3.5. By spherical completeness, there is some x € () Bs, and it is 
immediate that x is a pseudo-limit of (xs). a 


We close this section by considering briefly how pseudo-convergent se- 
quences may be generated when the set I is linearly ordered. Thus, in what 
follows, let (X, o, T) be a generalized ultrametric space in which T is a linearly 
ordered set. 


4.3.11 Lemma Let x,y,z E€ X with o(x,y) < o(y, z). Then olz, z) = oly, z). 


Proof: We have o(x,z) < max{o(z, y), o(y,z)} < oly, z) on using the strong 
triangle inequality. Now assume o(y, z) £ o(x,z). Then, because FI is linearly 
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ordered, we have o(x,z) < o(y, z), and by the strong triangle inequality again 
we obtain o(y, z) < max{o(z, y), o(x, z)} < oe(y, z), which is impossible. E 


4.3.12 Lemma Let n > 2, and suppose that (£1, 22,...,2n) is an n-tuple of 
elements of X satisfying 0(£i+1, £i+2) < 0(Ti, £ti+1) for i = 1,...,n — 2. Then 
o(z£1, £n) = o(£1, £2). 


Proof: We show by induction on n that the identity o(x1, £2) = o(£1, £n) 
holds. This is trivial for n = 2. So assume n > 2 and that the assertion 
holds for n— 1. Then 0(21, £2) = 0(£1, £%n—1), and consequently 0(£n—1, Un) < 
o(£1, £2) = 0(£1, £n—1). So Lemma 4.3.11 applies to the points 71, £n—-ı and 
£n and gives o(£1, £n) = 0(@1,%n-1) = 0(x1, £2), as required. E 


We can now establish the following result. 


4.3.13 Proposition Let (X, o,T) be a generalized ultrametric space in which 
T is a linearly ordered set. Furthermore, let f : X — X be strictly contracting, 
let zo € X, and let x; = f*(xo) for all i < w. Then the sequence (x;);<u is 
pseudo-convergent. 


Proof: Let a < B < y < w, and note then that (£a, %a41,.-.,28,---, Ly) 
satisfies the hypothesis of Lemma 4.3.12 because f is strictly contracting. 
So we obtain 0(%a,%8) = 0(%a,La+41) and o(%g,ry) = olz, xg+1). Thus, 
o(tg, £4) = 0(%8, 2841) < (fa, La+1) = O(La, Lg), as desired. E 


4.4 Dislocated Metrics 


Dislocated metrics were first studied by S.G. Matthews under the name of 
metric domains in the context of Kahn’s dataflow model.!5 We proceed now 
with the definitions needed for stating the main theorem of Matthews, which, 
in fact, is the form of the Banach contraction mapping theorem applicable to 
these spaces. Thus, we will define the notions of convergence, Cauchy sequence, 
and completeness for dislocated metrics. As it turns out, these notions can be 
carried over directly from the corresponding conventional ones. 


15The contents of Section 4.4, including Theorem 4.4.6, can be found in [Matthews, 1986]. 
Matthews and other authors have argued that the slightly less general notion of (weak) 
partial metric is more appropriate than that of dislocated metric from a domain-theoretic 
point of view. We refer the reader to [Matthews, 1994, Heckmann, 1999, Waszkiewicz, 2002] 
for an account of this, since we have no direct need of it, and indeed dislocated metrics are 
well-suited to our purposes. 
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4.4.1 Definition A sequence (xn) in a d-metric space (X, o) converges with 
respect to o or in o if there exists x € X such that o(£n, x) converges to 0 as 
n — œ. In this case, x is called a limit of (£n) in o. 


4.4.2 Proposition Limits in d-metric spaces are unique. 


Proof: Let x and y be limits of the sequence (xn) in a d-metric space (X, o). 
By properties (M3) and (M4) of Definition 4.2.1, it follows that o(z,y) < 
O(2n,£) + O(Ln, yY) > 0 as n — oo. Hence, olx, y) = 0, and by property (M2) 
of Definition 4.2.1, we obtain x = y. a 


4.4.3 Definition A sequence (xn) in a d-metric space (X,@) is called a 
Cauchy sequence if, for each € > 0, there exists no € N such that for all 
m,n > no we have o(£m, £n) < €. 


4.4.4 Proposition Every convergent sequence in a d-metric space is a 
Cauchy sequence. 


Proof: Let (a) be a sequence which converges to some x in a d-metric space 
(X, 0), and let € > 0 be chosen arbitrarily. Then there exists no € N with 
O(%n,v) < § for all n > no. For m,n > no, we then obtain o(£m, £n) < 
O(m, £) + O(@,%n) < 2- $ =e. Hence, (£n) is a Cauchy sequence. a 


4.4.5 Definition A d-metric space (X, o) is called complete if every Cauchy 
sequence in X converges with respect to o. Furthermore, a function f : X — X 
is called a contraction if there exists 0 < A < 1 such that o( f(x), f(y)) < 
Ao(x,y) for all x,y E€ X. 


4.4.6 Theorem (Matthews’ theorem) Let (X, o) be a complete d-metric 
space, and let f : X — X be a contraction. Then f has a unique fixed point. 


Proof: The proof follows the pattern of the proof of Theorem 4.2.3. Indeed, 
Parts (1) and (3) of that proof do not make use of condition (M1) and there- 
fore can be carried over literally. Part (2), however, needs to be modified since 
we do not have a suitable notion of topological convergence available for dis- 
located metric spaces.!® With the notation from the proof of Theorem 4.2.3, 
so that x denotes the limit of the Cauchy sequence (f”(y)), we make the 


16It is possible to carry over the complete proof of Theorem 4.2.3, but the constructions 
needed are rather involved. Details can be found in [Hitzler and Seda, 2000, Hitzler, 2001]; 
see also [Hitzler and Seda, 2003]. 
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following calculations for all n € N: 


o( f(x), x) < o( f(z), f"(x)) +o 
i fos 2) 0 

< o (x, f"*(y)) +o (F ly), fF" *(a)) + ol f(x), F"(y)) 

“+ olf" (y), 2) 

o (x, F7 (y)) +A" oly, x) + A” O(a, y) + o(f"(y), 2). 


Since all four terms in the last line converge to 0 as n — oo, we obtain 
o( f(x), x) = 0, and therefore f(x) = x by (M3) and (M2). a 


4.5 Dislocated Generalized Ultrametrics 


The following theorem gives a partial unification of Matthews’ theo- 
rem, Theorem 4.4.6, and the Prie8-Crampe and Ribenboim theorem, The- 
orem 4.3.6.!7 


4.5.1 Theorem Let (X,0,I) be a spherically complete d-gum, and let f : 
X — X be non-expanding and strictly contracting on orbits. Then f has a 
fixed point. If f is strictly contracting on X, then the fixed point is unique. 


Proof: Assume that f has no fixed point. Then for all x € X, we have 
olx, f(x)) A 0. We now define the set B by B = {Boca f(z) (x) | £ € X}, and 
note that each ball in this set is non-empty. We also note that Bo(«,f(x))(«) = 
Bove, f(e (f (£)) by Lemma 4.3.5. Now let C be a maximal chain in B. Since 
X is spherically complete, there exists z € (|C. We show that Boz, p(z))(z) © 
Bow,f(x)) for all x € X and, hence, by maximality, that Bo(z,f(2)) (2) is the 
smallest ball in the chain. Let Bove, ¢(x))(x) € C. Since z € Boia, f(z))(@), and 
noting our earlier observation that Box, ¢())(@) = Bole, f(@œ (f (x)) for all z, 
we get 0(z,2) < olx, f(x)) and o(z, f(x)) < o(a, f(x)). By non-expansiveness 
of f, we get o( f(z), f(z)) < o(z,x) < olx, f(x)). It follows by (U4) that 
o(z, f(z)) < olx, f(x)) and therefore by Lemma 4.3.5 that Bacz,f(z))(2) E 
Bow,f(x))(®) for all x € X, since x was chosen arbitrarily. Now, since f is 
strictly contracting on orbits, o(f(z), f?(z)) < o(z, f(z)), and therefore z ¢ 
Bagia, fee (F(Z) C Base (f(2)). By Lemma 4.3.5, this is equivalent to 
Bocg(z),f2(z)) (F(Z) C Bel, fi (2), which is a contradiction to the maximality 
of C. So f has a fixed point. 


17The proof of Theorem 4.4.6 given here is, in fact, identical to that of Theorem 4.3.6 
from [Prie8-Crampe and Ribenboim, 1993]. 
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Now let f be strictly contracting on X, and assume that x and y are two 
distinct fixed points of f. Then we get olx, y) = o( f(x), f(y)) < olz, y), which 
is impossible. So the fixed point of f is unique in this case. | 


We next give an iterative proof of a special case of Theorem 4.5.1. 


4.5.2 Theorem Let (X,o,I) be a spherically complete, dislocated general- 
ized ultrametric space with T = {2~* | a < y} for some ordinal y. We order 
T by 27° < 27° if and only if 3 < a, and denote 2~7 by 0. Thus, I is the set 
[441 of Remark 4.3.2. If f : X — X is any strictly contracting function on 
X, then f has a unique fixed point. 


Proof: Let x € X. Then we have f(x) € f(X) and o(f(z),x) < 27°, 
since 27° is the maximum possible distance between any two points in X. 
Now, o(f(f(z)), f(x)) < 271 < 27° since f is strictly contracting, and by 
(U4), it follows that o(f?(x),x) < 27°. By the same argument, we obtain 
olf? (x), f2(x)) < 27? < 271, and therefore o( f(x), f(z)) < 271. In fact, an 
easy induction argument along these lines shows that o(f"t!(a), f™(x)) < 
2-™ for m < n. Again by (U4), we obtain that the sequence of balls of the 
form Bo-n(f"(x)) is a descending chain (with respect to set-inclusion) if n is 
increasing and, therefore, has non-zero intersection By since X is assumed to 
be spherically complete. We therefore conclude that there is x, E€ B, with 
(tu, f"(x)) < 27” for each n E€ N. 

Next, for each n € N, we now argue as follows. Since o(f(£u), f ”Tt(x)) < 
oltu, f” (£)) < 27” and also o(£w, fP Tt (£)) < 27t) < 27%, we therefore 
obtain o(f (£u), £u) < 27”. Since this is the case for all n € N, it follows that 
Olf (Ew), £w) < 27%. 

It is straightforward to cast the above observations into a transfinite in- 
duction argument, and we obtain the following construction. Choose x € X 
arbitrarily. For each ordinal a < y, we define f° (x) as follows. If a is a succes- 
sor ordinal, then f®(x) = f(f*~1(z)), as usual. If a is a limit ordinal, then we 
choose f(x) as some £a which has the property that o(xa, f?(a)) < 27%, not- 
ing that the existence of such an vq is guaranteed by spherical completeness 
of X. 

The resulting transfinite sequence f®“(x) has the property that, for all 
a < y, o( ft (a), f°(x)) < 27% . Consequently, o( f7 (x), f7(a)) = 2-7 = 0, 
and therefore f(x) must be a fixed point of f. 

Finally, x, = f7 (x) can be the only fixed point of f. To see this, suppose 
y # xy is another fixed point of f. Then we obtain o(y, zy) = o( f(y), f(xy) < 
o(y, £7), from the fact that f is strictly contracting, and this is impossible. ll 


106 Mathematical Aspects of Logic Programming Semantics 


4.6 Quasimetrics 


Quasimetrics are a convenient way of reconciling metric and order struc- 
tures, see Example 4.6.4. We give the relevant definitions in order to state and 
prove the Rutten-Smyth theorem,!® which is the appropriate analogue of the 
Banach theorem for quasimetric spaces. 


4.6.1 Definition A sequence (zn) in a quasimetric space (X, d) is a (forward) 
Cauchy sequence if, for all € > 0, there exists no € N such that for all n > 
m > no we have d(£m, £n) < €. A Cauchy sequence (£n) converges to x E€ X 
if, for all y € X, d(x,y) = limp oo d(an, y). Finally, X is called CS-complete 
if every Cauchy sequence in X converges. 


Note that limits of Cauchy sequences in quasimetric spaces are unique. 
Given a quasimetric space (X,d), d induces a partial order <4 on X, called 
the partial order induced by d, by setting x <q y if and only if d(x,y) = 0. 
Furthermore, if (X,d) is a quasimetric space, then (X,d*) is a metric space, 
where d*(x,y) = max{d(z, y), d(y, x)}, and d* is called the metric induced by 
d. We call a quasimetric space (X,d) totally bounded if for every £ > 0 there 
exists a finite set EF C X such that for every y € X there is an e € E with 
d*(e,y) < €. 


4.6.2 Definition Let X be a quasimetric space, and let f : X — X bea 


function. 


(1) f is called CS-continuous if, for all Cauchy sequences (£n) in X which 
converge to x, (f(£n)) is a Cauchy sequence which converges to f(x). 


(2) f is called non-expanding if d( f(x), f(y)) < d(x,y) for all x,y E€ X. 


(3) f is called contractive if there exists some c with 0 < c < 1 such that 
d( f(x), f(y)) < c: d(x,y) for all x,y € X. 


Contractive mappings are not necessarily CS-continuous: consider the set 
NU {oo} with the natural order and the distance function 


ifa<y, 


d(x,y) = ifa=1 and y=0, 


= ne © 


otherwise. 


Then the function f which maps any n € N to 0 and œœ to 1 is contractive, 
but not continuous since limpen n = œ, whereas lim f(n) = 0 # 1 = f(oo). 


18We give Theorem 4.6.3 in the form in which it appears in [Rutten, 1996]; see also the 
paper [Rutten, 1995]. A more general version of this result was given in [Smyth, 1987] in 
the context of quasi-uniformities. 
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4.6.3 Theorem (Rutten-Smyth) Let (X,d) be a CS-complete quasimetric 
space, and let f : X — X be non-expanding. 


(a) If f is CS-continuous and there exists x € X with x <q f(x), then f has 
a fixed point, and this fixed point is least above x with respect to <q. 


(b) If f is CS-continuous and contractive, then f has a unique fixed point. 


Moreover, in both cases the fixed point can be obtained as the limit of the 
Cauchy sequence (f”(x)), where in (a) x is the given point, and in (b) x can 
be chosen arbitrarily. 


Proof: (a) For all n,k € N and k > 1, we have d(f"(x), f"*1(x)) < 
d(x, f(x)) = 0 and d(fr(x), f**(a)) < a a(f"t(a), f(a) = 0. 
Hence, (f”(x)) is a Cauchy sequence and has a unique limit y, say. Since 
f(y) = f(lim f” (x)) = lim f(f"(x)) = lim f” (x) = y, y is a fixed point of f. 
Now let z be a fixed point of f with x <q z. Then d(y, z) = lim d( f” (x), z) = 0, 
since d( f(x), f"(z)) < d(x,z) = 0. Hence, y <a z. 

(b) The proof given for Theorem 4.2.3 does not depend on condition (M3) 
other than implicitly for deriving continuity of f from the fact that it is a 
contraction. Since CS-continuity is a hypothesis in statement (b), the proof 
of Theorem 4.2.3 can be carried over by simply replacing “Cauchy sequence” 
by “forward Cauchy sequence” and “continuous” by “CS-continuous”, etc. W 


4.6.4 Example Let (X,<) be a partially ordered set. Define a function d< 
on X x X by 


0 ifa<y, 


1 otherwise. 


d<(2,y) = 


Then it is easily checked that (X,d<) is a quasi-ultrametric space; we call d< 
the discrete quasimetric on X. Note that <q. and < coincide for a given par- 
tial order <, and moreover (X, d) is totally bounded if and only if X is finite. 
By virtue of this definition and the definition of <q for a given quasimetric 
d, Part (a) of Theorem 4.6.3 generalizes Kleene’s theorem, Theorem 1.1.9, 
and Part (b) of Theorem 4.6.3 generalizes the Banach contraction mapping 
theorem, Theorem 4.2.3.19 


4.6.5 Example Note that it is easy to see that a sequence (J,,) in Ip.» is for- 
ward Cauchy relative to the discrete quasimetric d if and only if it is eventually 
increasing in the sense that there is a natural number k with the property that 
In C In4i whenever k < n, see [Seda, 1997, Proposition 1]. 

Consider the sequence (In) in the power set P(N) of the natural numbers 
determined by setting In = N if n is even and setting I„ = {0} otherwise. 
Then {0} is the greatest limit, gl(I,), of (In), yet (In) is not forward Cauchy 


19For further observations on this point, see [Smyth, 1987, Rutten, 1996]. 
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in the discrete quasimetric simply because it is not eventually increasing. 
Thus, it appears not to be possible to directly characterize the property of 
being forward Cauchy relative to the discrete quasimetric in terms of con- 
vergence in the Scott topology. This contrasts with the situation where the 
(forward) Cauchy sequences relative to the quasimetric determined by a level 
mapping, see Definition 4.6.9, can be described in terms of convergence in Q, 
see Proposition 4.6.8 and Corollary 4.6.12. a 


Using the observations made thus far, it is straightforward to recover the 
usual fixed-point semantics of definite logic programs, namely, to recover The- 
orem 2.2.3 Part (b) in terms of quasimetrics, by employing Theorem 4.6.3 Part 
(a) and the discrete quasimetric on (Ip, C). We briefly sketch this next and 
refer the reader to (Seda, 1997] for full details. 


4.6.6 Example Let P denote an arbitrary definite logic program, and let d 
denote the discrete quasimetric defined on the partially ordered set (Ip.2,C). 
Then it is shown in [Seda, 1997] that (Ip 2,d) is a CS-complete quasimetric 
space and that Tp is CS-continuous. We show here that, in fact, Tp is non- 
expansive and hence that Theorem 4.6.3 is applicable. 

Suppose first that d(I, I2) = 0. Then Jı C Ip so that Tp() C Tp(I2), 
and hence d(Tp(11), Tp(12)) = 0, as required. Next suppose that d(I, T2) 
takes value 1. Then immediately d(I, I2) > d(Tp(I),Tp(U2)), as required. 
Thus, Tp is indeed non-expansive relative to d. We note that, in contrast, Tp 
is not usually a contraction relative to any metric or quasimetric, since fixed 
points of Tp are not usually unique. In any event, we are now in a position to 
apply Theorem 4.6.3 since we have the following facts. 


(1) (Ip,d) is a CS-complete quasimetric space. 

(2) Tp : Ip2 — Ip. is non-expansive and CS-continuous. 

(3) The empty set Ý is a point in Ip. such that d(0, Tp(0)) = 0. 

Thus, on applying Theorem 4.6.3 and examining its proof, we conclude that 
Tp has a fixed point equal to the greatest limit gl(T}(0)), and this, in turn, 
is equal to UTB(9) = Tp T w, as shown in Chapter 3. Thus, we recover the 


classical least fixed point of Tp, as required. 


We will now use quasimetrics to characterize continuity in the Cantor 
topology of the immediate consequence operator for normal logic programs.?° 


4.6.7 Definition Let (D,E) be a domain, and let r : De — N be a function, 


20For more details of the results presented in this section, see [Seda, 1997]. 


Fixed-Point Theory for Generalized Metric Spaces 109 


called a rank function,?! such that r~!(n) is a finite set for each n € N. Define 
d,: Dx D > R by”? 


d,(x,y) := inf{2~" | (ela cE y) for all c € De with r(c) < n}. 
Then dy is called the quasi-ultrametric induced by r. 


It is straightforward to see that (D,d,) is a quasi-ultrametric space. Fur- 
thermore, dy induces the Scott topology on D, and (D, d») is totally bounded, 
see Proposition 4.6.10. 

In order to discuss the relationships between quasimetrics and the Cantor 
topology on spaces of interpretations, we need the following proposition. 


4.6.8 Proposition Let (X,d) be a totally bounded quasimetric space, and 
let (£n) be a Cauchy sequence in X. Then, for all e > 0, there exists k € N 
such that for all l,m > k, d*(£1,£m) < £. (A sequence with this property is 
usually called a bi-Cauchy sequence.) 


Proof: Choose £ > 0 and a finite subset Æ C X together with a map h: N > 
E such that d* (£n, h(n)) < $, using total boundedness. Since (£n) is a Cauchy 
sequence, there exists kg € N such that for all m > l > ko, d(£1,&m) < $- 
Now choose kı > ko such that for every e € E, the set h~*(e) N {n | n > ky} 
is either infinite or empty. Choose now l,m > kı, and let p > l be minimal 
such that h(p) = h(m). Then 


d(£1, €m) < d(x1, £p) + d(xp, h(p)) + d(h(p), £m) < 3- 
and by symmetry d* (£1, £m) < €. E 


We next define totally bounded quasi-ultrametrics on Ip, for a given pro- 
gram P, by using level mappings and show that these are closely related to 
the Cantor topology Q. 


4.6.9 Definition Let P be a normal logic program, and let 1: Bp — N bea 
level mapping for P such that 1~!(n) is finite for every n € N. The mapping 
l induces a rank function r : Ie — N defined by 

r(1) = max{i(4)}, 


where we take Ie = (Ip). to be the set of all finite subsets of Bp. By Definition 
4.6.7, r induces a quasi-ultrametric d, on Ip. 


For a given normal logic program P, we will denote Ip. by Ip for the rest 
of this section. 


21The notion of rank function will be given in more generality in Definition 4.8.12. 
22The definition of d, is similar to one made by M.B. Smyth in Example 5 of the paper 
[Smyth, 1991]. 
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4.6.10 Proposition With the notation established above, (Ip, d») is a totally 
bounded quasi-ultrametric space. 


Proof: Choose e = 27”, where n € N, and let E be the set of all subsets 
of Bp, the atoms of which are all of level less than or equal to n. Then F is 
finite by our assumption on l. For every I € Ip, let e be the restriction of I to 


atoms of level less than or equal to n. Then d*(e, I) < £, as is easily verified. 
a 


We have the following characterization of Cauchy sequences in Ip. 


4.6.11 Proposition A sequence (In) in (Ip,d,) is a Cauchy sequence if and 
only if for every n € N there exists k,, € N such that for all l,m > kp we have 
that I; and Im agree on all atoms of level less than n. 


Proof: Let (In) be a Cauchy sequence in Ip. Choose n € N, and let € = 27”. 
Since Ip is totally bounded, there exists kn € N such that for all l,m > kn, 
d(l, Im) < 27". By definition of d,, we obtain that J; and Im agree on all 
atoms of level less than n. The converse follows since the argument above 
clearly reverses. | 


4.6.12 Corollary Let (In) be a sequence in (Ip, dr). Then (In) is a Cauchy 
sequence if and only if (In) converges in Q to some I. Moreover, lim In = J, 
so (Ip, d,.) is complete. 


Proof: By Proposition 3.3.5 and the previous proposition, (In) is a Cauchy 
sequence if and only if (J,,) converges in Q to some I. It is easily verified that 
lin I, = I by noting that I = {A € Bp | A € In eventually}. It follows that 
(Ip,d,) is complete. a 


The previous result allows us to characterize CS-continuity in terms of Q. 


4.6.13 Proposition Suppose that l: Bp — N is a level mapping such that 
[~'(n) is finite for all n. Then the immediate consequence operator Tp is 
CS-continuous if and only if it is continuous in Q. 


Proof: Suppose that Tp is CS-continuous and that (In) is an arbitrary se- 
quence in Ip which converges in Q to some I € Ip. Then (Jn) is a Cauchy 
sequence, and by Corollary 4.6.12, lim I„ = I. By CS-continuity of Tp, we have 
lim Tp(I,) = Tp(Z), and again by Corollary 4.6.12, we have Tp(In) —> Tp(J) 
in Q, as required. 

Conversely, suppose Tp is continuous in Q and that (In) is a Cauchy 
sequence with lim J, = I, say. By Corollary 4.6.12, In — I in Q, and, by 
continuity of Tp in Q, we get Tp(In) — Tp(I), which yields limTp(I,) = 
Tp(I), again by Corollary 4.6.12. E 


Our next observation shows that non-expansiveness implies CS-continuity. 


Fixed-Point Theory for Generalized Metric Spaces 111 


4.6.14 Proposition Letl: Bp — N be an arbitrary level mapping satisfying 
the condition that 171 (n) is finite for each n € N. If Tp is non-expanding, then 
Tp is continuous in Q and hence is CS-continuous. 


Proof: Let Tp be non-expanding, and let (In) be a Cauchy sequence with 
lim [,, = I. Since Tp is non-expansive, we obtain 


0 < dr(Tp(In), Tp(I)) < drUn, I) > 0 


and 
0 < dr(Tp(I),Tp(In)) < d(I, In) > 0 


by total boundedness of Ip. By definition of dy and Proposition 4.6.11, it 
follows that Tp(I„) is a Cauchy sequence and, by Proposition 3.3.5 and the 
previous inequalities, Tp(I„n) converges in Q to Tp(I). Hence, lim Tp(In) = 
Tp(I), again by Corollary 4.6.12. 


We close with a brief discussion of several simple examples illustrating the 
methods and results of this section as applied to normal logic programs P 
relative to Tp defined on Ip. For full details the reader is again referred to 
[Seda, 1997]. Thus, suppose that P is a normal logic program, that d, is the 
quasimetric determined by a level mapping l defined on Bp and satisfying the 
property that /~1(n) is finite for all n, and that Tp is CS-continuous relative 
to d, or equivalently that Tp is continuous in the topology Q. 


4.6.15 Example Consider again the program P of Example 3.2.3 


pla) — 
p(s(X)) — p(X) 


and define l on Bp by I(p(s"(a))) = n. Then we see that d;(Tp(I1), Tp(U2)) < 
sq,(h, I>) for all J1, I2 € Ip. Therefore, Tp is a contraction and is continuous 
in Q and, hence, is CS-continuous. Thus, Theorem 4.6.3 applies and produces 
a unique fixed point of Tp. Of course, this fixed point coincides with the usual 
one produced by considering powers TR (Ø) of 0. 


4.6.16 Example Consider the program P 
p(s(X),a) — p(s(X), a) 


with the level mapping l defined on Bp by I(p(s"(a), s” (a))) =n +m. Then 
it is readily checked that Tp is non-expansive (and therefore continuous in 
Q), but not contractive, relative to the quasimetric d, determined by l, since 
it is easy to find distinct I; and Iz such that d,(Tp(11), Tp(2)) = dr (Å, I2). 
Thus, Theorem 4.6.3 is applicable and, needless to say, produces numerous 
fixed points of Tp. For this reason, it follows that Tp cannot be a contraction 
relative to any metric. Thus, the approach to finding fixed points based on 
metrics and the Banach contraction mapping theorem fails even for the rather 
simple program P. 
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4.6.17 Example Consider again the program P of Example 3.3.6 


pla) — 
p(s(X)) — >p(X) 


and note that P is not stratified nor even locally stratified. Define the level 
mapping l on Bp by I(p(s"(a))) = n for each n. We note that in this 
case Tp is not non-expansive, for if we take Jı = {p(a), p(s(a))} and Ig = 
{p(a), p(s(a)), p(s2(a))}, then Tp(Iy) = {p(a), p(s(a)), p(s"(a)), p(s°(a)),..-} 
and Tp(I2) = {p(a), p(s*(a)), p(s°(a)),...}. Thus, we have d,.(I1, I2) = 0 yet 
d,(Tp(I1), Tp(I2)) = 2~*, and therefore Tp is not non-expansive. Next, con- 
sider powers I,, = T} (Ø), the first few of which, as we have already seen, are as 
follows: I, = Bp, Iz = {p(a)}, 13 = Bp \ {p(s(a))}, I4 = {p(a), p(s?(a))}, Is = 
Bp \ {p(s(a)), p(s3(a))}, etc. Then we obtain that d;(In, In41) takes value 0 
if n is even and takes value 27”+! if n is odd. Therefore, the sequence (In) 
is Cauchy and converges to I, say, in Q. By Proposition 3.3.5, we have that 
(In) converges in Q to the set {p(a), p(s?(a)), p(s*(a)),...}, which therefore 
coincides with J. It follows that J is a fixed point of Tp, since Tp is contin- 
uous in Q, and indeed J is the only fixed point of Tp, as already noted in 
Example 3.3.6. 


4.6.18 Example Let P be the program 


P(X) — (X) 
r(s(X)) = r(X) 
a(X) — g(a), =œr( X) 


which is a slight modification of an example in [Apt et al., 1988, Page 97] and 
is stratified. Again, Tp is continuous relative to Q, but in this case Tp is not 
non-expansive for any choice of level mapping l and corresponding quasimetric 
dp. To see this, put I, = {q(a)} and Iz = Tp(l1) = {p(s(a)), p(s?(a)),...} U 
{q(a), q(s(a)),...}. Then d,(1, I2) = 0 for any dy simply because J C Ib. 
Since Tp(Iz2) = {¢(a), q(s(a)),...}, we must have d,(Tp(11),Tp(I2)) > 0 for 
any dr or in other words for any choice of l and corresponding d», so that Tp 
is never non-expansive. Taking J = {r(a)} and setting I„ = Tp (I), we have 
In = {r(s"(a))}U{p(a), p(s(a)), p(s?(a)),...}. Clearly, (In) is Cauchy (for any 
choice of level mapping and corresponding d,), and I, converges in Q to the 
fixed point {p(a), p(s(a)), p(s?(a)),...}- 


4.7 A Hierarchy of Fixed-Point Theorems 


For the reader’s convenience, we have collected together in Table 4.3 the 
main fixed-point theorems presented in this chapter, at least for single-valued 
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TABLE 4.3: Summary of single-valued fixed-point theorems. 


space name of theorem reference number symbol 
W-cpo Kleene 1.1.9 K 
cpo Knaster-Tarski 1.1.10 KT 
complete metric Banach 4.2.3 B 
compact metric — 4.2.4 cp 
gum Prie8-Crampe and 4.3.6 PCR 
Ribenboim 
d-metric Matthews 4.4.6 M 
d-gum — 4.5.1 dPCR 
quasimetric Rutten-Smyth 4.6.3 RS 
cpu 
cp 
K a B PCR 
KT RS M dPCR 


FIGURE 4.1: Dependencies between fixed-point theorems from Chapters 1 
and 4. The lower a theorem is placed in the diagram, the more general it is. 
See Table 4.3 for the abbreviations. 


mappings. In fact, we will consider generalizations of several of them to mul- 
tivalued mappings as well in the later sections of this chapter. Furthermore, 
the dependencies between these theorems are depicted in Figure 4.1, where 
the letters abbreviate the theorems as listed in Table 4.3. (The abbreviation 
“cpu” represents the statement that strictly contracting functions on compact 
ultrametric spaces have unique fixed points, which follows immediately from 
Theorem 4.2.4.) 
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4.8 Relationships Between the Various Spaces 


We move on next to study the relationships which exist between the var- 
ious different spaces we have introduced in this chapter. In particular, we 
focus on the representation of certain relationships in terms of others. This 
will in some cases lead to alternative proofs of fixed-point theorems we have 
already considered. The one exception to this comment is the interplay be- 
tween quasimetrics and partial orders. It is clear from the results of Section 4.6 
that this interplay is strong. But we will not consider it again other than in 
the context of multivalued mappings, see Sections 4.10 and 4.13; see also 
(Smyth, 1987, Smyth, 1991, Bonsangue et al., 1996, Rutten, 1996] for further 
details. 


4.8.1 Metrics and Dislocated Metrics 


Our intention here is to establish relationships between metrics and dis- 
located metrics. Furthermore, we will examine several methods of obtaining 
dislocated metrics from metrics, some of which will be applied later, and we 
will show how Matthews’ theorem can be derived from the Banach contraction 
mapping theorem. 

We begin by noting that if f is a contraction with contractivity factor À on 
a d-metric space (X, o), then we have o( f(x), f(x)) < olx, x) for all x € X. 
Furthermore, the property o(z,x) = 0 for all xz € X, if o happens to satisfy 
this, simply means that the d-metric ois actually a metric. It follows, therefore, 
that we are interested in studying the function u, : X — R associated with 
any d-metric ọ. 


4.8.1 Definition Let (X, 9) be a d-metric space. We define the function wy : 
X > R by uo(x) = o(x, x), for all x € X, and call it the dislocation function 
of o. 


Depending on the context, dislocation functions are sometimes also called 
weight functions, see, for example, [Matthews, 1994, Waszkiewicz, 2002]. 

The following result gives a rather general method by which d-metrics can 
be obtained from metrics. 


4.8.2 Proposition Let (X,d) be a metric space, let u : X — RẸ be a func- 
tion, and let T : R¢ x R => R be a symmetric function which satisfies the 
triangle inequality. Then (X, o), where 


o(z,y) = d(x,y) + T (u(x), u(y)) 


for all x,y € X is a d-metric space, and u,(x) = T (u(x), u(x)) for all z € X. 
In particular, if T(x, x) = x for all x € RẸ, then uy = u. 
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Proof: We check the axioms for a d-metric. (M2) If o(z,y) = 0, then 
d(x,y) + T(u(x),u(y)) = 0. Hence, d(x,y) = 0, and so x = y. (M3) Obvi- 
ous by symmetry of d and T. (M4) Obvious since d and T satisfy the triangle 
inequality. | 


Completeness also carries over if some continuity conditions are imposed. 


4.8.3 Proposition Using the notation of Proposition 4.8.2, let u be contin- 
uous as a function from (X, d) to RẸ (where X is endowed with the topology 
determined by d, and Rọ is endowed with its usual topology), and let T be 
continuous as a function from the topological product space (Rj)? to RẸ, 
satisfying the additional property T(x, x) = x for all x. If (X,d) is a complete 
metric space, then (X, o) is a complete d-metric space. 


Proof: Let (xn) be a Cauchy sequence in (X, o). Thus, for each £ > 0, there 
exists no € N such that for all m,n > no we have d(£m, £n) < d(£m, £n) + 
T(u(@m),U(2n)) = 0(£m, £n) < £. So (£n) is also a Cauchy sequence in (X, d) 
and therefore has a unique limit x in (X,d). In particular, we have £n > x 
in (X,d), and also u(x,) > u(x) and T(u(zn), u(x)) > T (u(x), u(x)) = u(x). 
We have to show that o(£n, x) converges to 0 as n > oo. For all n € N, we 
obtain o(£n, £) = d(an,x) + T(u(an), u(x)) > u(x) = u(x), and it remains 
to show that oọ(x, x) = 0. But this follows from the fact that (£n) is a Cauchy 
sequence, since it implies that u(an) = Up(@n) = 0(£n, fn) —> 0 as n —> ov, 
and hence by continuity of u we obtain u(x) = 0. a 


An example of a natural function T which satisfies the requirements of 
Propositions 4.8.2 and 4.8.3 is 


1 
T: RE xR{ >R : (x,y) = (a+): 


We discuss a few more examples of d-metrics; they are partly taken from 
(Matthews, 1992]. 


4.8.4 Example Let d be the metric d(x, y) = $|x—y| on Rj, let u : R > Rp 
be the identity function, and define T(x, y) = (x + y). Then o as defined in 
Proposition 4.8.2 is a d-metric, and o(x, y) = $|x—y|+43(x+y) = max{z, y} 
for all x,y € R. 


4.8.5 Example Let Z be the set of all closed intervals in R. Then d: T xT — 
R defined by 


1 
d({a, b], [c, d]) = 5 (la — cl + [b — dl) 
is a metric on Z. Let u : Z > RE be defined by 


u({a, b]) =b- a 
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and let T be defined as in Example 4.8.4. Then the construction in Proposition 
4.8.2 yields a d-metric ọ such that 


o((a, b], |c, d]) = max{b, d} — min{a, c} 


for all [a,b], [c,d] € T. 
Indeed, we obtain 


1 1 1 1 
o({a, b], [c, d]) T d((a, b], [c, d]) FIA 5° 9% 54 9° 
1 
= 5 (lb dj +b+d+ |la—cl—a-c) 
= 5 (lb-d) +(04+d)) +5 (la—el—(ato)) 


2 
= max{b, d} — min{a, c}. 

4.8.6 Example (Rj, 0) is a dislocated metric space, where ọ is defined by 

olz, y) =% +y. 


The following proposition gives an alternative way of obtaining d- 
ultrametrics from ultrametrics. We will apply this later in Section 5.1.2. 


4.8.7 Proposition Let (X,d) be an ultrametric space, and let u : X > Rf 
be a function. Then (X, o), where 


o(z,y) = max{d(zx, y), u(x), u(y) } 


for all x,y € X, is a d-ultrametric, and oọ(x,x) = u(x) for all x € X. If 
u is continuous as a function on (X,d), then completeness of (X, d) implies 
completeness of (X, o). 


Proof: (M2) and (M3) are obvious. 
(M5) We obtain for all x,y,z € X 


For completeness, let (£„) be a Cauchy sequence in (X, o). Then (zn) is a 
Cauchy sequence in (X,d) and converges to some x € X. We then obtain 
o(£n, £) = max{d(zn, x), u(an), u(x)} — u(x) as n > oo. As in the proof of 
Proposition 4.8.3, we obtain u(x) = 0, and this completes the proof. E 


We want to investigate next the relationship between Matthews’ theorem, 
Theorem 4.4.6, and the Banach contraction mapping theorem, Theorem 4.2.3. 
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4.8.8 Proposition Let (X, o) be a d-metric space, and defined: X x X —> R 
by setting d(x, y) = o(x,y) for x 4 y and by setting d(x, x) = 0 for alla € X. 
Then d is a metric on X. 


Proof: We obviously have d(x, x) = 0 for all x € X. If d(x,y) = 0, then either 
x= y or e(x,y) = 0, and from the latter we also obtain x = y. Symmetry is 
clear. We want to show that d(x,y) < d(x,z) + d(z,y) for all x,y,z € X. If 
d(x, z) = o(x, z) and d(z, y) = o(z, y), then the inequality is clear. If d(x,z) = 
0, then x = z, and the inequality reduces to d(x,y) < d(x,y), which holds. If 
d(z,y) = 0, then z = y, and the inequality reduces to d(x, y) < d(x,y), which 
also holds. a 


4.8.9 Definition The metric d just defined from the d-metric @ is called the 
metric associated with o. 


Considering Step (3) of the proof of Theorem 4.2.5, we easily verify that 6 
is a dislocated ultrametric and note also that d is the metric associated with 
ô. 

The following proposition allows one to derive from completeness of d, in 
general, that o itself is complete. 


4.8.10 Proposition Let (X, o) be a d-metric space, and let d denote the 
metric associated with o. If the metric d is complete, then so is o. If f is a 
contraction relative to o, then f is a contraction relative to d with the same 
contractivity factor. 


Proof: Suppose that (xn) is a Cauchy sequence in 9. Then for all € > 0, 
there exists no such that o(£k, £m) < € for all k,m > no. Consequently, we 
also obtain d(£k, 2m) < € for all k,m > no. Since d is complete, the sequence 
(£n) converges in d to some x, and d(£n,x) — 0 as n — oo. We show that 
0(£n, £) — 0 as n — ov, and to do this we consider two cases. 

Case i. Assume that the sequence (zn) is such that there exists no satisfying 
the property that for all m > no, we have £m Æ x. Then o(£m, £) = d(am, 2) 
for all m > no so that o(£m, x) — 0, and hence o(zn, x) —> 0. 

Case ii. Assume that there exist infinitely many nz € N such that £n, = x. 
Since (£n) is a Cauchy sequence with respect to o, we obtain o(£n,,£) < € 
for all € > 0, and so o(x, x) = 0. Hence, o(£n, £) = d(£n, x) for all n € N, and 
we obtain that o(£n, x) — 0 as n > oo, as required. 

Let A € [0,1) be such that o(f(x), f(y)) < àolx,y) for all zy € X, 
and let x,y E€ X. If f(x) = f(y), then we have d(f(x), f(y)) = 0, hence 


Uf (a), f(y)) < Ad(x,y). If F(x) # f(y), then £ # y, and so d(f(x), f(y)) = 
o( f(x), F(Y)) < olz, y) = Ad(a, y), as required. a 
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4.8.11 Proposition Let (X, o) be a complete d-metric space, and let d de- 
note the metric associated with o. Then the metric d is complete. However, 
if f is a contraction relative to d, it does not follow that f is necessarily a 
contraction relative to o. 


Proof: Let (xn) be a Cauchy sequence in d. If (£n) eventually becomes con- 
stant, then it obviously converges in d. So, assume that this is not the case. 
Then the sequence (xn) must contain infinitely many distinct points; other- 
wise, it would not be a Cauchy sequence. We define a subsequence (yn) of (£n) 
which is obtained by removing multiple occurrences of points in (£n). For each 
n EN, let yn = zk, where k is minimal with the property that, for all m < n, 
we have ym # £p. Since (yn) is a subsequence of the Cauchy sequence (£n), 
we see that (yn) is also a Cauchy sequence relative to d. But, for any two 
elements y, z in the sequence (yn), we have that d(y, z) = oly, z) by definition 
of d. Therefore, (yn) is a Cauchy sequence in @ and, hence, converges in o to 
some Yw E X. So, (yn) also converges in d to yw. We show that (£n) converges 
to Yw in d. Let £ > 0 be chosen arbitrarily. Since (£n) is a Cauchy sequence 
with respect to d, there exists an index nı such that d(£k,&m) < $ for all 
k,m > nı. Since (yn) converges to yw in o, we also know that there is an index 
nz with Yn, = Tn; for some index ng such that ng > nı and d(Ynz, Yu) < $- 
For all x, with n > ng, we then obtain d(£n, yu) < dEn, Ena) +d(Eng, Yw) < €, 
as required. 

Let X = {0,1}, and define the mapping f : X — X by setting f(x) = 0 
for all x € X. Let o be constant and equal to 1. Then o is a complete d-metric, 
and f is a contraction relative to d. However, o( f(0), f(1)) = o(0,0) = o(0, 1), 
and so f is not a contraction relative to @. a 


The results we have just established put us in a position to prove Matthews’ 
theorem, Theorem 4.4.6, by using the Banach contraction mapping theorem, 
Theorem 4.2.3, and this we do next. 


Proof of Theorem 4.4.6 Let (X, o) be a complete d-metric space, and let f 
be a contraction relative to o. Let d be the metric associated with o. Then d is 
a complete metric, and f is a contraction relative to d. Hence, f has a unique 
fixed point by the Banach contraction mapping theorem, Theorem 4.2.3. E 


4.8.2 Domains as GUMS 


It is our intention here to cast Scott domains into ultrametric spaces, a 
construction we will use later in Chapter 5. Usually, domains are endowed with 
the Scott topology, see Section A.6. However, as we will see next, domains can 
be endowed with the structure of a spherically complete ultrametric space. 
This is not something normally considered in domain theory. However, as 
already noted at the beginning of the chapter, one of the objectives of the 
chapter is to discuss a variety of distance functions, including (generalized) 


Fixed-Point Theory for Generalized Metric Spaces 119 


ultrametrics, which have applications both in logic programming and more 
generally in theoretical computer science.?? 


As in Remark 4.3.2, let y denote an arbitrary ordinal, and let I, denote 
the set {27% | a < y} of symbols 27% ordered by 27% < 27° if and only 
if 8 < a. As already noted, this ordering is, in effect, the dual of the usual 
ordering on y. However, we find it convenient to work with the set I’, and the 
ordering just defined, rather than with the dual ordering on y, especially in 
the context of contraction mappings whose contractivity factor is z, see, for 
example, Proposition 4.8.17 and particularly Theorem 5.1.6. 

We recall that the set of compact elements in a domain D is denoted by 
De, see Definition 1.1.4. 


4.8.12 Definition Let r: De — y be a function, called a rank function, form 
I',41, and denote 277 by 0. Define o, : Dx D > T4414 by o-(x, y) = inf{2~° | 
cE x if and only if c E y for every c € De with r(c) < a}. 


It is readily checked that (D, or) is a generalized ultrametric space. We 
call o, the generalized ultrametric induced by the rank function r. Indeed, the 
intuition behind oe, is that two elements x and y of the domain D are close 
if they dominate the same compact elements up to a certain rank (and hence 
agree in this sense up to this rank); the higher the rank giving agreement, the 
closer are x and y. Furthermore, (D, or) is spherically complete. The proof 
of this claim does not make use of the existence of a bottom element of D, 
so this requirement can be omitted. The main idea of the proof is captured 
in the next lemma, which shows that chains of balls give rise to chains of 
elements in the domain. It depends on the following two elementary facts, 
which result immediately from Lemma 4.3.5: (1) if y < ô and x € Bs(y), then 
B,(x) C Bs(y), and (2) if B,(x) C Bs(y), then ô Z y (thus, y < ô, if T is 
totally ordered). 

In order to simplify notation in the following proofs, we will denote the 
ball Bə-a (x) by B(x). 


4.8.13 Lemma Let B°(y) and B%(zx) be arbitrary balls in (D, o). Then the 
following statements hold. 


(a) For any z € B? (y), we have {c € approx(z) | r(c) < 3} = {c € approx(y) | 
r(c) < p}. 


(b) Bs = |_|{c € approx(y) | r(c) < 8} and Ba = | {c € approx(x) | r(c) < 
a} both exist. 


23This point of view is further developed in a number of papers including the following: 
[Kuhlmann, 1999], [Ribenboim, 1996], [Bouamama et al., 2000], [Prieß-Crampe, 1990]; also 
the papers [Prie8-Crampe and Ribenboim, 1993], [Prieß-Crampe and Ribenboim, 2000c], 
[Prieß-Crampe and Ribenboim, 2000b], [Prieß-Crampe and Ribenboim, 2000a] should be 
consulted. 
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(c) Bg € B? (y) and Ba € B° (x). 
(d) Whenever B*(x) C BP (y), we have Bg E Ba. 


Proof: (a) Since @,(z,y) < 27°, the first statement follows immediately from 
the definition of or. 

(b) Since the set {c € approx(z) | r(c) < B} is bounded by z, for any z and 
GB, the second statement follows immediately from the consistent completeness 
of D. 

(c) By definition, we obtain Bg E y. Since Bg and y agree on all c € De 
with r(c) < 8, the first statement in (c) holds, and the second similarly. 

(d) First note that x € B? (y), so that B°(y) = BÊ (x), and the hypothesis 
can be written as B*(x) C B(x). We consider two cases. 

Case i. If 8 < a, then using (a) and noting again that x € BÊ (y), we get 
Bg = ||{c € approx(y) | r(c) < B} = L|{c € approx(x) | r(e) < 6} EC L]{c € 
approx(z) | r(c) < a} = Ba, as required. 

Case ii. If œ < 8, then we cannot have B®(x) C BÊ (x), and we therefore 
obtain B®(x) = BÊ (x) and consequently B°(Bg) = BÊ (Bp) = B? (Ba) using 
(c). With the argument of Case i and noting that y € B(x), it follows that 
Ba E Bg. We want to show that Ba = Bg. Assume, in fact, that Ba C Bg. 
Since any point of a ball is its centre, we can take z = Bg in (b), twice, to 
obtain Bg = [| |{c € approx(Bg) | r(c) < 8} and By = ||{c € approx( Bg) | 
r(c) < a}. Thus, the supposition Ba C Bg means that | ]{c € approx(Ba) | 
r(c) < a} C L]{e € approx(Bg) | r(c) < 8}. Since {c € approx(Bg) | r(e) < 
a} C {c € approx( Bg) | r(c) < B}, there must be some d € {c € approx( Bg) | 
r(c) < B} with d Z [|{c € approx(Bg) | r(c) < a} = Ba. Thus, there is an 
element d € De with r(d) < 8 satisfying d Z Ba and dC Bg. This contradicts 
the fact that or(Ba, Bo) < 27°. Hence, Ba É Bg. Since By E Ba, it follows 
that By = Bg and therefore that Bg E Ba, as required. | 


4.8.14 Theorem The ultrametric space (D, or) is spherically complete. 


Proof: By the previous lemma, every chain (B°(2.)) of balls in D gives rise 
to a chain (Ba) in D in reverse order. Let B = | | Ba. Now let B°(x,) be 
an arbitrary ball in the chain. It suffices to show that B € B%(x,,). Since 
Ba € B° (£a), we have or(Ba, £a) < 27%. But or is a generalized ultrametric, 
and so it suffices to show that 0,(B,B,) < 27%. For every compact element 
c CE Ba, we have cE B by construction of B. Now let cE B with c € D. and 
r(c) < a. We have to show that c E Ba. Since c is compact and c E B, there 
exists Bg in the chain with c E Bg. If B° (aq) C BP (xg), then Bg E Ba by 
Lemma 4.8.13, and therefore c E By. If BÊ (xg) C B°(xq), then a < 3, and, 
since c E Bg, we see that c is an element of the set {c € approx(xg) | r(c) < 
a} = {c € approx(za) | r(c) < a}. Since Ba is the supremum of the latter 
set, we have cL Ba, as required. a 


We will apply this result in Section 5.1.1. 


Fixed-Point Theory for Generalized Metric Spaces 121 


4.8.3 GUMS and Chain Complete Posets 


In this section, we will invert the point of view of the previous one by 
associating a chain-complete partial order with any generalized ultrametric 
space (X,0,I°) whose distance set I is an ordinal endowed with, essentially, 
the dual ordering as considered in the previous section. Thus, for the duration 
of this section, I is the set I',,1 for some ordinal y with the ordering described 
in Remark 4.3.2. For convenience, we will henceforth call such a generalized 
ultrametric space a gum with ordinal distances; recall that we denote 277 by 
0. 

The motivation for adopting our current point of view is to provide a 
domain-theoretic proof of the Prie8-Crampe and Ribenboim theorem.?4 In 
fact, we will prove the Prie’-Crampe and Ribenboim theorem using the 
Knaster-Tarski theorem in this special case of gums with ordinal distances. 
As a matter of fact, this special case will suffice for all our purposes since, in 
applications, all the gums we encounter have ordinal distances, simply because 
they arise from level mappings. 

Our main technical tool is the space of formal balls associated with a given 
metric space, see [Edalat and Heckmann, 1998]. Our first task is to extend this 
notion to generalized ultrametrics. ?° 

Let (X,e,T) be a generalized ultrametric space with ordinal distances, 
and let B’X be the set of all pairs (x,a) with x € X and a € T. We define 
an equivalence relation ~ on B’X by setting (@1,a1) ~ (x2,Q@2) if and only 
if a; = ag and e(a1,22) < ay. The quotient space BX = B’X/ ~ will be 
called the space of formal balls associated with (X,o,T), and it carries an 
ordering E which is well-defined (on representatives of equivalence classes) by 
(x,a) E (y, 8) if and only if o(a, y) < a and 8 < a. We denote the equivalence 
class of (x,a) by [(a,@)], and note of course that the use of the same symbol 
C between equivalence classes and their representatives should not cause any 
confusion. 


4.8.15 Proposition The set BX is partially ordered by E. Moreover, X is 
spherically complete if and only if BX is chain complete. 


Proof: That BX is partially ordered by [E is clear. 

Let X be spherically complete, and let [(ag,)] be an ascending chain in 
BX. Then Bg(xg) is a chain of balls in X with non-empty intersection; let 
x € ()Ba(ag). Then o(xg,x) < 8 for all 8. Hence, the chain [(xg, 3)] in BX 
has [(2,0)] as an upper bound. Now consider the set A of all a € T such 
that [(z, a)] is an upper bound of [(xg, 3)]. Since we are working with ordinal 
distances only, the set A has a supremum y, and hence [|(x,y)] is the least 
upper bound of the chain [(ag, 8)]. 

Now suppose BX is chain complete, and let (Ba(xg)) ge, be a chain of 


24This approach is inspired by [Edalat and Heckmann, 1998], where the Banach contrac- 
tion mapping theorem is derived from Kleene’s theorem. 
25For more details, see [Hitzler and Seda, 2003]. 
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balls in X, where A CT. Then [(xg, 8)] is an ascending chain in 6X and has 
least upper bound (x, 7), and hence B,(x) € N ge, Ba(xa). a 


4.8.16 Proposition The function ı : X — BX, where (x) = [(x,0)] for each 
x € X, is injective, and 1(X) is the set of all maximal elements of BX. 


Proof: Injectivity of v follows from (U2). The observation that the maximal 
elements of BX are exactly the elements of the form [(#,0)] completes the 
proof. | 


Now suppose that f is a strictly contracting mapping on a generalized ul- 
trametric space (X, 0, I) with ordinal distances. We use f to induce a mapping 
Bf: BX — BX defined by 


(FORIO) TE 2G, 


Bf(£, 27“) = es f2- = 0, 


4.8.17 Proposition If f is strictly contracting, then Bf is monotonic. 


Proof: Let (x,27%) E (y, 27%), so that o(z, y) < 27% and a < 6. If 2-* = 0, 
there is nothing to show, so assume 27% Æ 0. It then remains to show that 
o(f (x), f(y)) < 2-+, and this holds since f is strictly contracting and 
because the following Statements (i) and (ii) hold, as is easily verified, namely, 
(i) a+1< 8+1 if 2-6 £0, and (ii) a+ 1 < 8 if 27° =OandaF b. a 


Alternative Proof of Theorem 4.3.6 Let (X,o,I) be a spherically com- 
plete generalized ultrametric space with ordinal distances, and let f : X — X 
be strictly contracting. Then BX is a chain-complete partially ordered set, 
and Bf is a monotonic mapping on BX. For Bo € BX, we denote by | Bo the 
upper cone of Bo, that is, the set of all B € BX with Bo E B, as defined in 
Section 3.2. 

Let x € X be arbitrarily chosen, assume without loss of generality that 
x # f(x), and also let a be an ordinal such that o(x, f(x)) = 27%. Then 
(a,2-°) E (f(x),2-+)), and by monotonicity of Bf we obtain that Bf 
maps Î[(x,27%)] into itself. Since 7 [(a,27~°)] is a chain-complete partial order 
with bottom element [(z,2~°)], we obtain by the Knaster-Tarski theorem, 
Theorem 1.1.10, that Bf has a least fixed point in f [(x,27%)], which we will 
denote by Bo. 

It is clear by definition of Bf that Bo must be maximal in BX and, hence, 
is of the form [(a9,0)]. From Bf[(xo, 0)] = [(vo,0)], we obtain f(xo) = xo, so 
that xo is a fixed point of f. 

Now assume that y # xo is another fixed point of f. Then 0(xo,y) = 
o(f(xo), f(y)) < o(xo,y) since f is strictly contracting. This contradiction 
establishes that f has no fixed point other than xo. E 
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We note finally that the constructions used for casting domains into gener- 
alized ultrametrics as in Section 4.8.2 and for casting generalized ultrametrics 
into chain-complete partial orders as in Section 4.8.3 are not inverses of each 
other, and the exact relationship between these processes remains to be de- 
termined. 


4.8.4 GUMS and d-GUMS 


We move next to study relationships between gums and d-gums and pro- 
vide results somewhat parallel to those of Section 4.8.1, where we contrasted 
metrics and d-metrics. Indeed, our main objective here is to investigate the 
relationship between the Prief-Crampe and Ribenboim theorem, Theorem 
4.3.6, and its dislocated version, Theorem 4.5.2. 


4.8.18 Proposition Let (X,0,I) be a dislocated generalized ultrametric 
space, and define d : X x X — T by setting d(x,y) = o(x,y) for x # y 
and setting d(x, x) = 0 for all z € X. Then d is a generalized ultrametric. 


Proof: The proof is straightforward following Proposition 4.8.8. a 


4.8.19 Definition The generalized ultrametric d just defined from the d- 
generalized ultrametric o is called the generalized ultrametric associated with 
0. 


4.8.20 Proposition Let (X,0,I) be a dislocated generalized ultrametric 
space, and let d denote the generalized ultrametric associated with o. If d is 
spherically complete, then o is spherically complete. If f is strictly contracting 
relative to o, then f is strictly contracting relative to d. 


Proof: We first show that non-empty balls in ọ contain all their midpoints. 
So let {y | o(x, y) < a} be some non-empty ball in ọ with midpoint x. Then 
there is some z € {y | o(a,y) < a}, and we obtain o(z,x) < olx, z) by (U4). 
Since o(x,z) < a, we have x € {y | o(x,y) < a}. Hence, every non-empty ball 
in ọ is also a ball with respect to d. 

Now let 5 be a chain of non-empty balls in o. Then B is also a chain of 
balls in d and has non-empty intersection by spherical completeness of d, as 
required. 

Let x,y € X with xz Æ y, and assume o(f(z), f(y)) 
flu), then A(S), F(a) = 0; and hence a 0). FG) < dey 


z) 
then x # y, and so d( f(x JW = olf (x), y) olz, y) z d(x,y), as 
required. a 


<o 
T 
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4.8.21 Proposition Let (X,0,T) be a spherically complete dislocated gen- 
eralized ultrametric space, and let d denote the generalized ultrametric associ- 
ated with o. Then d is spherically complete. However, if f is strictly contract- 
ing relative to d, it does not follow that f is necessarily strictly contracting 
relative to o. 


Proof: Let B be a chain of balls in d. If B contains a ball B = {x} for some 
x € X, then z is in the intersection of the chain. So assume that all balls in 
B contain more than one point. 

Now let By(%m) = {x | d(x,%m) < y} be a ball in B, and let z E€ By (£m) 
with z Æ £m. Then o(£m,®£m) < 0(2, £m) = d(z, £m) < y; hence By(£m) = 
{x | o(£,£m) < y}. It follows that B is also a chain of balls in @ and, hence, 
has non-empty intersection by spherical completeness of o, as required. 

Let X = {0,1}, and define a mapping f : X — X by f(x) = 0 for all 
x E€ X. Let o be constant and equal to 1. Then (X, o, {0,1}), where 0 < 1, 
is a spherically complete d-gum and f is strictly contracting relative to d. 
However, o( f(0), f(1)) = e(0,0) = o(0, 1), and so f is not strictly contracting 
relative to o. a 


We can now use Theorem 4.3.6 to give an easy proof of Theorem 4.5.2, 
as follows. With the notation used in Theorems 4.3.6 and 4.5.2 and using 
Proposition 4.8.18, we obtain a generalized ultrametric space (X, d ,T°) which is 
spherically complete by Proposition 4.8.21. By Proposition 4.8.20, the function 
f is strictly contracting relative to d. Hence, by Theorem 4.3.6, f has a unique 
fixed point. 

We close this section by giving two constructions of d-gums from gums. 


4.8.22 Proposition Let (X,d,I) be a generalized ultrametric space with 
ordinal distances, and let u : X — T be a function. Then the distance function 
o defined by 


o(x, y) = max{d(x, y), u(x), u(y) } 


is a dislocated generalized ultrametric on X. 


Proof: (U2) and (U3) are trivial. For (U4), see the proof of Proposition 4.8.7. 
a 


This result will be applied in Section 5.1.3. 


4.8.23 Proposition Let (X,d,I) be a generalized ultrametric space with 
ordinal distances, let z € X, and define the distance function @ by 


olz, y) = max{d(zx, z), d(y, z)}. 


Then (X,o0,I) is a spherically complete, dislocated generalized ultrametric 
space. 
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Proof: Clearly, @ is a d-gum. For spherical completeness, note that every 
non-empty ball in (X, e,I) contains z, and this suffices. a 


This result will be applied in Section 5.1.4. 


4.9 Fixed-Point Theory for Multivalued Mappings 


We close this chapter with a discussion of multivalued mappings and some 
of the fixed-point theorems which are applicable to them. 

Let X be a set. Then a multivalued mapping T defined on X is simply a 
mapping T : X — P(X) from X to the power set P(X) of X; thus, for each 
x € X, T(x) is a subset of X. Furthermore, a fixed point of a multivalued 
mapping T is an element x of X such that x € T(x). Such mappings are 
important in studying semantics in the presence of non-determinism because 
at any step in the execution of a non-deterministic program, there will in 
general be many possible successive states, and therefore the informal meaning 
of such a program may be taken to be a multivalued mapping defined on the 
set X of states the program may assume. These comments apply in particular 
to disjunctive logic programs in which the head of a typical program clause 
contains a disjunction of several atoms, rather than a single atom, and in 
executing such a program a non-deterministic choice has to be made of an 
atom in the head of any clause involved in the execution. 

Not surprisingly, given their informal meaning, the formal meaning of dis- 
junctive programs involves fixed points of multivalued mappings. Therefore, it 
is of interest to consider fixed-point theorems in this context and the methods 
used to establish them. Again, not surprisingly, the methods normally used 
to establish such theorems depend either on order theory or on generalized 
metrics of one type or another, and we consider both approaches. 

We begin by considering an interesting recent paper by Straccia, Ojeda- 
Aciego, and Damásio, see [Straccia et al., 2009], and relating their work to 
ours. In this paper, the authors use methods depending on order theory to 
establish a number of results guaranteeing the existence of least and greatest 
fixed points of a multivalued mapping T : L — P(L), where L is a complete 
lattice. In contrast, the methods we will employ mainly depend on the methods 
of analysis. Furthermore, as noted below, the results of [Straccia et al., 2009] 
are broadly representative of those obtained by order theory. Therefore, it 
will help to state a result of [Straccia et al., 2009], which gives a flavour of 
its contents and is typical of results obtained in the field by order theory. 
However, to do this requires the statement of some preliminary definitions, 
but they will be needed in any case as we proceed. 

Given the complete lattice (L, <) and its power set P(L), we define three 
orderings on P(L) familiar in semantics and domain theory, as follows, see 
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[Abramsky and Jung, 1994]. First, the Smyth ordering <s defined by X <s Y 
if and only if for each y € Y there exists x € X such that x < y. Second, we 
define the Hoare ordering XH by X <y Y if and only if for each x € X there 
exists y € Y such that x < y. Finally, we define the Egli-Milner ordering Xz 
by X <gm Y if and only if X <s Y and X <p Y. Next, we say that T is 
Smyth monotonic or simply S-monotonic if, for all x,y € X satisfying x < y, 
we have T(x) <s T(y). The notions of Hoare monotonicity and Egli-Milner 
monotonicity are defined similarly. 

We are now in a position to present the following result of Straccia, Ojeda- 
Aciego, and Damasio, see [Straccia et al., 2009, Prosposition 3.10]. 


4.9.1 Proposition Let T : L — P(L) be a multivalued mapping, where L is 
a complete lattice. 


(a) If T is S-monotonic and for all x € L, T(x) has a least element, then T 
has a least fixed point. 


(b) If T is H-monotonic and for all x € L, T(x) has a greatest element, then 
T has a greatest fixed point. 


Straccia et al. also introduce a very general class of logic programs P, a 
class much more general than conventional disjunctive logic programs, and 
proceed to define a multivalued semantic operator Tp associated with each 
program P in the class in question. On applying their fixed-point theorems, 
they establish a one-to-one correspondence between the models of any program 
P and the fixed points of Tp. All these results are order-theoretic in nature, 
although, in summarizing their conclusions, the question of deriving fixed- 
point theorems for multivalued mappings using methods from analysis is raised 
by the authors, but not taken up in detail. 

Thus, we will focus here mainly on those fixed-point theorems for multi- 
valued mappings which employ analytical methods and results in their for- 
mulation or in their proofs, rather than on results which depend primarily on 
order theory. This is partly for the reason stated at the end of the previous 
paragraph and partly because the results of [Straccia et al., 2009] largely sub- 
sume the order-theoretic results derived by several other contributors to this 
subject anyway, except that the latter are usually presented in the context 
of complete partial orders rather than in the less general context of complete 
lattices employed by Straccia and his co-authors. On the other hand, most 
other authors require the condition that the multivalued mapping T is non- 
empty in the sense that, for all x € X, we have T(x) 4 Q, a condition that 
Straccia et al. do not impose. However, despite the opening sentence of this 
paragraph, we do wish to consider a result of our own which gives a form, for 
multivalued mappings, of the Rutten-Smyth theorem discussed earlier, Theo- 
rem 4.6.3, and its role in unifying the order-theoretic and metric approaches to 
the fixed-point theory of multivalued mappings, and this of course necessitates 
some discussion of order theory. 
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In fact, it turns out that the majority of the fixed-point theorems we have 
already considered earlier in this chapter can be directly carried over to the 
multivalued setting, and indeed our main task now is to carry out this exten- 
sion. Thus, we present multivalued versions of the Knaster-Tarski theorem, the 
Banach contraction mapping theorem, the Rutten-Smyth theorem referred to 
in the previous paragraph, and Kleene’s theorem. We do not, however, include 
any applications of these results here, although they do indeed have a number 
of applications to the semantics of (conventional) disjunctive logic programs, 
see [Khamsi et al., 1993, Khamsi and Misane, 1998, Hitzler and Seda, 1999c, 
Hitzler and Seda, 2002a]. 


4.10 Partial Orders and Multivalued Mappings 


Throughout, T : X — P(X) will denote a multivalued mapping defined 
on X. Furthermore, unless stated to the contrary, T will be assumed to be 
non-empty. 

We begin by discussing a fixed-point theorem first established by M.A. 
Khamsi and D. Misane, see [Khamsi and Misane, 1998]. It can be viewed as a 
multivalued version of the Knaster-Tarski theorem, Theorem 1.1.10; a multi- 
valued version of Kleene’s theorem, Theorem 1.1.9 will be presented in Section 
4.13. 


4.10.1 Definition Let T : X — P(X) be a multivalued mapping defined on 
X. An orbit of T is a net (2;)iez in X, where Z denotes an ordinal, such that 
vita E€ T(a;) for all i € T. An orbit (a;)ier of T is called an w-orbit if T is 
the first limit ordinal, w. An orbit (xi)iez of T will be said to be eventually 
constant if there is a tail (xi): <i of (xi)iez which is constant in that x; = zj 
for all i,j € Z satisfying to < i, j. 


T: X — P(X) is a multivalued mapping and z is a fixed point of T, 
then we obtain an orbit of T which is eventually constant by setting z = 
Lo = £1 = T2. ... Conversely, suppose that (xi)iez is an orbit of T with the 
property that £i+ı = x; for alli € Z satisfying io < i, for some ordinal ip € Z. 
Then Zig = Xio+1 € T (Tip), and we have a fixed point x;, of T. Thus, having 
a fixed point and having an orbit which is eventually constant are essentially 
equivalent conditions on T. 


4.10.2 Definition Suppose that T is a multivalued mapping defined on a 
partially ordered set X. An orbit (x;)iez of T is said to be increasing if we have 
zi < xj for all i,j € Z satisfying i < j and is said to be eventually increasing 
if some tail of the orbit is increasing. Finally, an increasing orbit (2;)jer of T 
is said to be tight if, for all limit ordinals j € Z, we have x; = | Hz; | i < j}. 
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Suppose that (x;);cez is an increasing orbit of T and that j € Z is a limit 
ordinal. Then z+, is an element of T (xj) such that x; < £j+1 for alli < j, and 
of course | |{a; | i < j} < £j < £j+1 if the supremum exists. In particular, 
any increasing orbit (2;);ez which is tight (if such exists) must satisfy the 
following condition: for any limit ordinal j, there exists x = 7;41 such that 


xET(| |{e|i<j}) and | [fi li<j}<z. (4.1) 


This condition is a slight variant of a condition which was identified by Khamsi 
and Misane as a sufficient condition for the existence of fixed points of Hoare 
monotonic multivalued mappings. In fact, the following result was established 
by them, see [Khamsi and Misane, 1998], except that it was formulated for 
decreasing orbits and infima, and we have chosen to work with the dual notions 
instead to be consistent with the form of Kleene’s theorem we give later. 


4.10.3 Theorem (Knaster-Tarski multivalued) Suppose that X is a 
complete partial order and that T : X — P(X) is a multivalued mapping 
which is non-empty, Hoare monotonic, and satisfies condition (4.1). Then T 
has a fixed point. 


We omit details of the proof of this result except to observe that, start- 
ing with the bottom element x) = L of X, the condition (4.1) permits the 
construction, transfinitely, of a tight orbit (x;) of T. Since this can be carried 
out for ordinals whose underlying cardinal is greater than that of X, we are 
forced to conclude that (x;) is eventually constant and therefore that T has a 
fixed point. 

Noting that | Hæ; | i < j} = ||{ai41 | i < j}, one can view condition (4.1) 
schematically as the statement “|_|{T(a:) | i< j} < T(LHz: | i< j})”, and it 
can therefore be thought of as a rather natural, weak continuity condition on 
T which is automatically satisfied by any monotonic single-valued mapping T 
on a complete partial order. The question of when the orbit constructed in the 
previous paragraph becomes constant in not more than w steps is a question 
of continuity, as in the single-valued version, and will be taken up in Section 
4.13. 

Theorem 4.10.3 was established by Khamsi and Misane in order to 
show the existence of (consistent) answer sets for a class of disjunctive 
logic programs called signed programs. We have shown elsewhere, see 
[Hitzler and Seda, 1999c], that it sometimes is necessary to work transfinitely 
in practice, a point which justifies the name “Knaster-Tarski theorem” applied 
to Theorem 4.10.3. 

Thus, in summary, Hoare monotonicity of T together with (4.1) gives, 
for multivalued mappings, an exact analogue of the fixed-point theory for 
monotonic single-valued mappings due to Knaster-Tarski. Moreover, there are 
applications of it to the semantics of disjunctive logic programs which parallel 
those made in the standard, non-disjunctive case. 
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4.11 Metrics and Multivalued Mappings 


We discuss here a result established by M.A. Khamsi, V. Kreinovich, and 
D. Misane, see [Khamsi et al., 1993], which is a multivalued version of the 
Banach contraction mapping theorem, Theorem 4.2.3. 


4.11.1 Definition Let (X,d) be a metric space. A multivalued mapping T : 
X — P(X) is called a contraction if there exists a non-negative real number 
A < 1 such that for every x € X, for every y € X, and for all a € T(x) there 
exists b € T(y) such that d(a,b) < Ad(a, y). 


The result we wish to state is as follows; a proof of it will be given in 
Section 4.13. 


4.11.2 Theorem (Banach multivalued) Let X be a complete metric 
space, and suppose that T is a multivalued contraction on X such that, for 
every x € X, the set T(x) is closed and non-empty. Then T has a fixed point. 


This theorem was also established with a specific objective in view, namely, 
to show the existence of answer sets for disjunctive logic programs which are 
countably stratified, again see [Khamsi et al., 1993]. 


4.12 Generalized Ultrametrics and Multivalued 
Mappings 


We next turn our attention to multivalued versions of the PrieS-Crampe 
and Ribenboim theorem, Theorem 4.3.6. 


4.12.1 Definition Let (X, o,T) be a generalized ultrametric space. A mul- 
tivalued mapping T defined on X is called strictly contracting (on X) (re- 
spectively, non-expanding (on X)) if, for all x,y € X with x Æ y and for 
every a € T(x), there exists an element b € T(y) such that o(a,b) < olz, y) 
(o(a,b) < o(x,y)). Furthermore, the mapping T is called strictly contracting 
on orbits if, for every xz € X and for every a € T(x) with a Æ x, there exists 
an element b € T(a) such that o(a,b) < o(a, x£). 


For T : X > P(X), let IL, = {o(x,y) | y € T(x)}, and, for a subset A CT, 
denote by Min A the set of all minimal elements of A. 

Note that these definitions collapse to those already considered for single- 
valued mappings if, in fact, T is single valued, meaning that T(x) is a singleton 
set for each x € X. 
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The following theorem was proved by Prie8-Crampe and Ribenboim, see 
[Prie8-Crampe and Ribenboim, 2000c], and is a multivalued version of Theo- 
rem 4.3.6. 


4.12.2 Theorem (Prie8-Crampe and Ribenboim) Let (X,0,T) be a 
spherically complete, generalized ultrametric space, and let T : X — P(X) 
be non-empty, non-expanding, and strictly contracting on orbits. In addition, 
assume that for every x € X, Min I, is finite and that every element of II, 
has a lower bound in Min Iz. Then T has a fixed point. 


This result has several corollaries, due to PrieS-Crampe and Ribenboim, 
see [PrieS-Crampe and Ribenboim, 2000c], both for multivalued mappings 
and for single-valued mappings, and we state two of these next for complete- 
ness. Theorem 4.12.2 has been applied to establish the stable model semantics 
for disjunctive logic programs, see [Seda and Hitzler, 2010]. Note that Theo- 
rem 4.12.4 is a slight extension of Theorem 4.3.6. 


4.12.3 Theorem Let (X, 0,I) be spherically complete, and let I be narrow, 
that is, such that every trivially ordered subset of T is finite. Let f : X > P(X) 
be non-empty, strictly contracting on orbits and such that f(x) is spherically 
complete for every x E€ X. Then f has a fixed point. 


4.12.4 Theorem Let (X,0,I) be a spherically complete, generalized ultra- 
metric space, and let f : X — X be non-expanding on X. Then either f 
has a fixed point or there exists a ball B,(z) such that o(y, f(y)) = 7 for all 
y € B,(z). If, in addition, f is strictly contracting on orbits, then f has a 
fixed point. Finally, this fixed point is unique if f is strictly contracting on X. 


The following ideas are closely related to the notion of value semigroup 
given in Definition 4.1.2 and were considered by Khamsi, Kreinovich, and 
Misane in the context of the stable model semantics for disjunctive logic pro- 
grams, see [Khamsi et al., 1993]. We show that, in fact, these notions basically 
coincide with those from generalized ultrametric theory. 


4.12.5 Definition Let V be an ordered Abelian semigroup with 0, and let 
X be an arbitrary set. A g-metric on X is a mapping p: X x X — V which 
satisfies the following conditions for all x,y,z € X. 


(1) p(x, y) = 0 if and only if x = y. 
(2) p(z,y) = ply, 2). 
(3) p(w, y) < plz, z) + p(z, y). 


A pair (X, p) consisting of a set X and a g-metric p on X is called a g-metric 
space. 
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In fact, g-metrics were called generalized metrics by Khamsi, Kreinovich, 
and Misane, but we have changed the terminology since the term “generalized 
metric” is, of course, already used differently by us. Actually, we will not work 
with g-metrics in general since the closely related generalized ultrametrics will 
suffice for our purposes. Indeed, we consider this relationship next, and we 
begin by recalling the observations we made in Remark 4.3.2. Thus, let V 
denote the set of all expressions of the type 0 or 27%, where a > 0 is an 
ordinal. An order is defined on V by: 0 < v for every v € V, and 27% < 27° if 
and only if 6 < a. As a semigroup operation u + v, we will use the maximum 
max(u, v). It will be convenient to write 427% = 27(¢+), 

The following definition is due to Khamsi, Kreinovich, and Misane, see 
[Khamsi et al., 1993]. 


4.12.6 Definition Assume that a is either a countable ordinal or w1, the first 
uncountable ordinal, and that v = (vg) <a is a decreasing family of elements 
of V. Let X be a g-metric space relative to V, and let (%g)g<q be a family of 
elements of X. 


(1) (ag) is said to v-cluster to x € X if, for all G, we have p(xg,x) < vg 
whenever 3 < a. 


(2) (ag) is said to be v-Cauchy if, for all 8 and y, we have p(xg, £4) < vg 
whenever 8 < y <a. 


(3) X is said to be v-complete or just complete if, for every v, every v-Cauchy 
family v-clusters to some element in X. 


(4) A set Y C X will be called v-complete or just complete if, for every v, 
whenever a v-Cauchy family consists of elements of Y, it v-clusters to 
some element of Y. 


A close relationship exists between the notion of completeness for g-metrics 
and the notion of trans-completeness, Definition 4.3.8, for generalized ultra- 
metrics. Indeed, we show that these notions coincide by showing equivalence 
between completeness for g-metrics and spherical completeness for generalized 
ultrametrics, see Proposition 4.3.10. 


4.12.7 Definition A multivalued mapping T : X —> P(X) is called a (4)- 
contraction if, for every x € X, for every y € X, and for every a € T(x), 


there exists b € T(y) such that p(a,b) < p(x, y). 


The following theorem was proved by Khamsi, Kreinovich, and Misane in 
[Khamsi et al., 1993]. 


4.12.8 Theorem Let X be a complete g-metric space, let T be a multivalued 
(4)-contraction defined on X such that T(x) is not empty for some z € X 
(so that T is not identically empty), and suppose that for every x € X the set 
T(x) is complete. Then T has a fixed point. 
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We next present some results relating those just given to the notion of 
spherical completeness we discussed earlier. Indeed, we show that if (X, p) is 
a g-metric space with respect to V as given in Definition 4.3.2, then p is a 
generalized ultrametric space, and vice-versa. 


4.12.9 Proposition Let (X, p) be a complete g-metric space with respect to 
V. Then X is spherically complete as an ultrametric space. 


Proof: Let B = (By, (28) pea be a decreasing chain of balls in X, and without 
loss of generality assume that it is strictly decreasing and that a is a limit 
ordinal. We have to show that (|B # 0. Let v = (vg)g. Since B is a chain, it 
is easy to see that (%g41) is v-Cauchy and therefore, by completeness of X, 
(xg+1) v-clusters to some x € X. By definition, this means that p(ag+1,2) < 
vg and therefore that x E€ B,,(%g41) = Bus (£6) for all 6. Thus, re B. M 


In the opposite direction, we have the following result. 


4.12.10 Proposition Let (X,,V) be a spherically complete, generalized 
ultrametric space. Then X is complete as a g-metric space. 


Proof: Let v= (vg) be a decreasing family of elements of V which is, without 
loss of generality, strictly decreasing, and let (xg) be v-Cauchy. For v € v, 


for example, v = 27°, let v’ denote 2~(¢+), Then B = (Bu,(e9)) , is a 


decreasing chain of balls in X. By spherical completeness, it has non-empty 
intersection. Choose x € (|B. Then for all 8 we obtain p(xg,x) < vg < vg, 
and so (xg) v-clusters to a. a 


This means, by virtue of Theorem 4.12.2, that we can reformulate the 
assumptions in Theorem 4.12.8 and thereby obtain the following result, which, 
in fact, is a special case of a theorem of PriefS-Crampe and Ribenboim, see 
[PrieB-Crampe and Ribenboim, 2000c, (3.4)]. 


4.12.11 Theorem Let X be a spherically complete, generalized ultrametric 
space (with respect to V), and let T be a multivalued, non-empty, and strictly 
contracting mapping defined on X such that T(x) is spherically complete for 
all x € X. Then T has a fixed point. 


4.13 Quasimetrics and Multivalued Mappings 


We move next to study a multivalued version of the Rutten-Smyth the- 
orem, Theorem 4.6.3. As a consequence, we obtain a multivalued version of 
Kleene’s theorem, Theorem 1.1.9.76 


26For further details, see [Hitzler and Seda, 1999c]. 
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4.13.1 Definition Let (X, d) be a quasimetric space. A multivalued mapping 
T : X — P(X) is called a contraction if there is a A € [0,1) such that, for 
all x,y € X and for all a € T(x), there exists b € T(y) satisfying d(a,b) < 
Ad(x,y). We say that T is non-expanding if, for all x,y € X and for all 
a E€ T(x), there exists b € T(y) satisfying d(a,b) < d(x,y). 


Again, these definitions are clearly extensions of the corresponding defi- 
nitions made for single-valued mappings and indeed collapse to them in the 
case where T is single valued. An obvious and natural definition of continuity 
of T is the following: for every Cauchy sequence (£n) in X with limit « and 
for every choice of yn € T(£n), we have that (yn) is a Cauchy sequence and 
lim yn € T(x). In fact, the weaker definition following, which is implied by the 
one just given, suffices for our purposes and will be used throughout. 


4.13.2 Definition Let T : X — P(X) be a multivalued mapping defined on 
a quasimetric space (X,d). We say that T is continuous if we have lima, € 
T(limz,,) for every w-orbit (£n) of T which is a Cauchy sequence. 


Once more, this definition collapses to a natural one if T is single valued. 
In fact, if T is single valued, it simply states the condition that lim T (£n) = 


lim zn} = lima, = T(limaz,) for every w-orbit which is a Cauchy se- 
quence, which is a weaker condition than that of CS-continuity as in Definition 
4.6.2(1). 


Finally, if (X,d) is a quasimetric space, we define the associated partial 
order <q on X by x <q y if and only if d(x,y) = 0, see Section 4.6. 

The main result of this section is the following theorem, generalizing the 
Rutten-Smyth theorem we gave earlier, Theorem 4.6.3. 


4.13.3 Theorem (Rutten-Smyth multivalued) Let (X,d) be a CS- 
complete quasimetric space, and let T : X — P(X) denote a non-empty 
and continuous multivalued mapping on X. Then T has a fixed point if either 
of the following two conditions holds. 


(a) T is a contraction. 


(b) T is non-expanding, and there is a» € X and zı € T(ao) such that 
d(xo, 21) = 0, that is, To <4 Tı. 


Proof: (a) Let xo € X. Since T (xo) # 0, we can choose zı € T (xo). Since T is 
a contraction, there is x2 € T(x) such that d(x1, £2) < Ad(x0, £1). Applying 
this argument repeatedly, we obtain a sequence (£n) such that for all n > 0 
we have £n}1 € T(£n) and d(£n41, £%n+42) < Ad(@n,2n41). Thus, (£n) is an 
w-orbit. Using the triangle inequality, we obtain 


m—1 m-1 


dl da Enim) < DD d( Entis Enpi+1) < X A" d(x, 21). 


i=0 1=0 
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Since the last summation here is dominated by pda, x1), we see that (£n) 
is a (forward) Cauchy sequence in X and therefore is an w-orbit of T which 
is Cauchy. Since X is complete, (£n) has a limit zu. Now, by continuity of T, 
we obtain x, E€ T(£w), and x, is a fixed point of T, as required. 

(b) Let a € X and x, E€ T(xo) satisfy d(zọ, x1) = 0. Since T is non- 
expanding, there is 2 € T(#1) with d(x1, £2) < d(£o, £1) = 0. Inductively, 
we obtain a sequence (£n) such that gn41 E T(an) and d(£n,£n4k) < 
pies A(Ln+4i;Unti¢1) = 0. Hence, (£n) is an orbit of T which is forward 
Cauchy and therefore has a limit x,,. By continuity of T again, we see that 
Ly is a fixed point of T. a 


The proof given here of Part (a) of Theorem 4.13.3 is, up to the last 
step, exactly the same as the first half of the proof of the multivalued Ba- 
nach contraction mapping theorem, Theorem 4.11.2, established by Khamsi, 
Kreinovich, and Misane, except that we are working with a quasimetric rather 
than with a metric and therefore care needs to be taken that no use is made 
of symmetry. On the other hand, the proof we give next of Theorem 4.11.2, 
which roughly corresponds to the second half of the proof given by Khamsi, 
Kreinovich, and Misane, is shorter and technically somewhat simpler than the 
proof given by them. 


Proof of Theorem 4.11.2 We show that the condition that T(x) is closed 
for every x together with that of T being a contraction implies that T is 
continuous, and the result then follows from Part (a) of Theorem 4.13.3. 

First note that (X, d) being a complete metric space means that (X, d) is 
complete as a quasimetric space, and obviously T satisfies Part (a) of Theorem 
4.13.3. Now suppose that (zn) is an orbit of T which is a forward Cauchy 
sequence and, hence, a Cauchy sequence; we want to show that x, € T (£w), 
where £u is the limit of (£n). 

Since T is a contraction, for every n there exists yn E€ T(x) such that 
d(tn41, Yn) < Ad(an, £w). Therefore, d(yn, tw) < d(Yn,Un41) + A(a@n41, £w) < 
Ad(Xn,2w) + d(En+1, £w). Hence, we have yn > x. But each yn E€ T(2w), 
and T(x) is closed for every x. Consequently, the limit «,, of the sequence yn 
also belongs to T (zw). So, x, E€ T(a,,), and it follows that T is continuous, as 
required. | 


Thus, Theorem 4.13.3 contains, as a consequence, the multivalued Banach 
contraction mapping theorem, Theorem 4.11.2, discussed earlier. It also con- 
tains a natural extension of Kleene’s theorem to multivalued mappings, The- 
orem 4.13.6 below, as we show next. Thus, Theorem 4.13.3 gives a unification 
of metric and order-theoretic notions in direct analogy with the corresponding 
unification given, in the single-valued case, by Theorem 4.6.3. 

In order to proceed, we make some preliminary and elementary observa- 
tions, as follows, concerning partially ordered sets and the quasimetrics they 
carry, see Section 4.6. The proofs are straightforward and are omitted. 
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4.13.4 Proposition Let (X,<) be a partial order, and let (X, d) denote the 
associated quasimetric space, so that d = d< as in Section 4.6. Then the 
following hold. 


(a) A non-empty multivalued mapping T : X — P(X) is Hoare monotonic if 
and only if it is non-expanding. 


(b) A sequence (zn) in X is eventually increasing in (X,<) if and only if it 
is a Cauchy sequence in (X, d). 


(c) The partially ordered set (X,<) is w-complete if and only if (X,d) is 
complete as a quasimetric space. Furthermore, in the presence of either 
form of completeness, the limit of any Cauchy sequence is the least upper 
bound of any increasing tail of the sequence. 


Notice that neither Part (c) of this result nor the next definition assumes 
the presence of a bottom element. 


4.13.5 Definition Let the partial order (X,<) be w-complete, and let T : 
X — P(X) be a non-empty multivalued mapping on X. We say that T is 
w-continuous if T is Hoare monotonic, and for any w-orbit (£n) of T which 
is eventually increasing, we have | |(£x„n) € T(L|(an)), where the supremum is 
taken over any increasing tail of (x,). 


We obtain finally the following form of Kleene’s theorem for multivalued 
mappings as an easy corollary of our Theorem 4.13.3. This theorem has been 
applied by the present authors to find answer sets for certain classes of dis- 
junctive logic programs, see [Hitzler and Seda, 1999c]. 


4.13.6 Theorem (Kleene multivalued) Let (X, <) be an w-complete par- 
tial order (with bottom element), and let T : X — P(X) be a non-empty, 
w-continuous multivalued mapping on X. Then T has a fixed point. 


Proof: Since (X,<) is w-complete, the associated quasimetric space (X, d) 
(with d = d< as in Section 4.6) is complete by Proposition 4.13.4. Furthermore, 
T is Hoare monotonic, since it is w-continuous and is therefore non-expanding 
by Proposition 4.13.4 again. On taking zo = L and zı € T(ao) arbitrarily, 
we have x and 2, satisfying d(xp,21,) = 0. The result will therefore follow 
from Part (b) of Theorem 4.13.3 as soon as we have established that T is 
continuous in the sense of Definition 4.13.2. 

Let (£n) be any w-orbit of T which is a Cauchy sequence. Then (£n) is 
eventually increasing, and, by w-continuity of T, we have |_|(an) € T(||(xn)), 
where the supremum is taken over any increasing tail of (x,,). In other words, 
we have lim zn € T(lim zn), and hence we have the continuity of T that we 
require. | 


Kleene’s theorem for single-valued mappings T asserts that the fixed point 
produced by the usual proof is the least fixed point of T. This assertion does 
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not immediately carry over to the case of multivalued mappings T without 
additional assumptions. One such simple, though rather strong, condition is 
the following: for each z € X, assume that T(x) has a least element My and 
that Mz; < My whenever x < y. To see that this suffices, suppose that x is 
any fixed point of T, and construct the orbit (xn) of T by setting zo = L and 
Zn+1 = Mz, for each n. Then (£n) converges to a fixed point %. Noting that 
L < x and that My < x, we see that £n < x for all n. Hence, % < x. 


4.14 An Alternative to Multivalued Mappings 


As already noted earlier, multivalued mappings arise naturally as semantic 
operators in relation to disjunctive logic programs. However, William Rounds 
and Guo-Qiang Zhang have shown that the use of multivalued mappings in this 
context can be avoided by employing single-valued mappings defined on power 
domains instead (we refer the reader to [Stoltenberg-Hansen et al., 1994] for 
details of power domains). In fact, this observation is part of a considerable 
programme of research undertaken by the authors just mentioned in the appli- 
cation of domain theory to logic programming. Since their work complements 
that presented here, we intend to make a few remarks about a couple of aspects 
of it, and it is convenient to do this next. 

The starting point of this programme of work is the observation that do- 
mains and logic are strongly related [Zhang, 1991] and that this relationship 
may be used as a foundation for a theory of logic programming based on do- 
main theory. In [Zhang and Rounds, 1997a, Zhang and Rounds, 1997b] and 
[Rounds and Zhang, 2001], Rounds and Zhang use power domains to develop 
a domain-theoretic view of default logic, which they call power defaults. In- 
deed, in this framework logic programs can be viewed in a rather simple way 
as default theories in the sense of [Reiter, 1980]. Default theories constitute 
an important formalism in the area of non-monotonic reasoning, and we re- 
fer the reader to [Bidoit and Froideveaux, 1991, Gelfond and Lifschitz, 1991, 
Bochman, 1995, Lifschitz, 2001] and to the references contained in these pa- 
pers for an interesting discussion of the relationship between default logic and 
logic programs. Indeed, from this point of view, the standard models of a dis- 
junctive program, such as the stable model, correspond to extensions in default 
logic: in short, truth in a model corresponds to default theorem. Furthermore, 
Rounds and Zhang [Rounds and Zhang, 2001, Zhang and Rounds, 1997a, 
Zhang and Rounds, 1997b, Zhang and Rounds, 2001] study a version of de- 
fault reasoning from the domain-theoretic point of view. In particular, they 
focus on the Smyth powerdomain by making the observation that the Smyth 
powerdomain can be used to model non-monotonicity. This results, for ex- 
ample, in the implementation of a non-monotonic reasoning system, see 
Klavins et al., 1998], which bears a significant relationship to other answer 
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set programming systems which have been investigated with implementation 
in mind, see [Lifschitz, 1999, Marek and Truszczyniski, 1999]. In addition, in 
[Rounds and Zhang, 2001, Zhang and Rounds, 2001], Rounds and Zhang in- 
troduced a domain-theoretic framework for the study of the semantics of logic 
programming, both procedural and non-procedural, including an abstract res- 
olution rule, together with a treatment of negation, which is not negation as (fi- 
nite) failure, however. [Hitzler, 2003a, Hitzler and Wendt, 2003, Hitzler, 2004, 
Hitzler and Krotzsch, 2006] further expand on some aspects of the work of 
Rounds and Zhang and in particular relate it to Formal Concept Analysis 
[Ganter and Wille, 1999] and to answer set programming.?? 

Of course, the monotonicity notions for multivalued mappings used mainly 
in this chapter correspond to orderings encountered in power domains. In par- 
ticular, this applies to Hoare montonicity and to Smyth monotonicity. With 
this and the comments of the previous paragraph in mind, we note finally that 
in Chapter 6 of [Zhang and Rounds, 2001], a treatment is given of the seman- 
tics of disjunctive logic programs (as considered here) with the same overall 
objective as our own. The treatment is based on the Smyth powerdomain 
again. One important feature of this power-domain approach is that by using 
the right domain, the concept of multivalued function is avoided and continu- 
ity can always be taken to be Scott continuity. Thus, in conclusion, we note 
that overall the developments just described appear to hold out, in particular, 
the possibility of a domain-theoretic treatment of the declarative semantics 
of negation in logic programming and therefore to bring logic programming 
semantics more fully into the realm of domain theory, and vice-versa. 


27See Footnote 3 in the Introduction. 


Chapter 5 


Supported Model Semantics 


Among the various semantics for normal logic programs discussed in Chapter 
2, the supported model semantics, whether in two-valued or in three-valued 
form, is most fundamental: stable and perfect models are two-valued sup- 
ported models; and well-founded and weakly perfect models are three-valued 
supported models. Furthermore, as shown in Theorem 2.6.14, if the Fitting 
model for a program P is total, then P has a unique two-valued supported 
model which coincides with the unique model assigned to P by the Fitting, 
the well-founded, the weakly perfect, and the stable semantics: the semantics 
in this case is unambiguous. 

Programs which have unique supported models together with those which 
have total Fitting models can therefore be considered to be of fundamental 
importance for understanding logic programming semantics as presented in 
Chapter 2. The former, namely, programs with unique supported models, are 
called by us uniquely determined, while we call the latter @-accessible pro- 
grams. We know from Theorem 2.6.14, as just noted, that every ®-accessible 
program has a unique supported model. The converse, however, is not true in 
general, as the following example shows. 


5.0.1 Program The program 


p&p 
a 


has a unique supported model {p} and Fitting model 9. 


In this chapter, we study supported models in two-valued and three-valued 
logic, with particular emphasis on uniquely determined and -accessible pro- 
grams. In particular, in Section 5.1 we consider two-valued supported models 
and apply generalized metric fixed-point theorems from Chapter 4 in order 
to show that certain classes of programs are uniquely determined. As is to 
be expected, more general fixed-point theorems allow the treatment of more 
general classes of programs, so that the hierarchy of fixed-point theorems from 
Section 4.7 gives rise to a hierarchy of program classes, each of which has the 
property that all programs in the class have unique supported models. Such 
program classes are consequently called unique supported model classes. 

The same hierarchy of unique supported model classes will be considered 
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again in Section 5.2, but this time from the point of view of three-valued sup- 
ported models (more precisely by studying variants of Fitting’s ®-operator). 
By analogy with Chapter 2, we will establish a correspondence between se- 
mantics defined, on the one hand, by means of monotonic operators, and 
characterizations given by means of level mappings, on the other hand. As a 
result, we obtain a hierarchy of program classes which extends observations 
from Chapter 2. All this will be carried out in this chapter in Section 5.3. 

Finally, in Section 5.4, we make some brief observations concerning how 
one may approach the results of this chapter from a much more general point 
of view. 


5.1 Two-Valued Supported Models 


We know from Proposition 2.2.6 that the (two-valued) supported mod- 
els for a given program P are exactly the fixed points of the corresponding 
single-step operator Tp. From Program 2.2.4, we know that Tp is in general 
not monotonic. This fact has the particular consequence that the fixed-point 
theorems from Section 1.1 for monotonic operators are not applicable to Tp 
in this case. The alternative suggested by our development in Chapter 4 is to 
apply, to non-monotonic single-step operators, fixed-point theorems utilizing 
generalized metrics. In particular, it suggests in our current context the ap- 
plication of those theorems which directly generalize the Banach contraction 
mapping theorem to the extent that they ensure uniqueness of the resulting 
fixed points, if any. Of course, if we successfully apply any of these particu- 
lar theorems to a single-step operator, the corresponding program will clearly 
be uniquely determined. It follows, therefore, that any approach of this type 
employing fixed-point theorems which guarantee uniqueness of the resulting 
fixed points cannot, when applied to single-step operators, encompass all (def- 
inite) programs. Program 2.3.1, for example, is definite, but has two supported 
models and, hence, cannot be uniquely determined. 


Throughout the present section, it will be convenient to let Ip denote Ip. 


5.1.1 Acyclic and Locally Hierarchical Programs 


Let us first recall the program Even (Program 2.1.3). Iterates of the corre- 
sponding immediate consequence operator TEyen are easily computed and are 
as follows, for all n € N, see Example 3.3.6. 


Even 


Trt! = Beven \ {even (go (0) |\O0<k< n} 


Even 


Te”... = {even (s7*(0)) |O<k <n}, 
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We notice that the sequence of iterates is alternating in a certain sense. The 
iterates with even numbers successively generate the atoms in the supported 
model M = {even(s?”"(0)) |n € N}, while the iterates with odd numbers 
successively delete those atoms which are not in M. The order in which the 
atoms are generated or deleted is such that atoms with more occurrences of 
the function symbol s are generated or deleted later. This corresponds to the 
structure of the Even program, whose rules reflect this in the sense that the 
atom in the head of a ground instance of the second program clause always 
contains one more function symbol than the corresponding body atom. 

The following definition abstracts from this and draws on the observation 
made in the previous paragraph that iterates of the immediate consequence 
operator can in some sense be controlled if there is a strong dependency be- 
tween heads of clauses and their corresponding body atoms. This is a theme 
which will dominate the discussion of this chapter, and the reader may have 
already noticed that it is related to the characterizations of semantics using 
level mappings given in Chapter 2. The precise relationship between these two 
themes will be made more explicit in Section 5.2. 


5.1.1 Definition A normal logic program P is called locally hierarchical! if 
there exists a level mapping l : Bp — a, for some ordinal a, such that for 
each clause A — L1,..., Ln in ground(P) and for all i = 1,...,n we have 
L(A) > 1(L;). If a can be chosen here to be w, then P is called acyclic.” 


The Even program is acyclic, as can be seen by defining |: Bp — a by 
l (even (s*(0))) = k for all k € N. 


5.1.2 Program (ExistsEven) Consider the following program, which ex- 
tends Even. We call it ExistsEven because intuitively, and also when run 
under Prolog, it is a generate-and-test program which tests whether or not 
there exists an even number. 


nat(0) — 
nat(s(X)) — nat(X) 
even(0) — 
even(s(X)) — ~even(X) 


existsEven — nat(X), even(X) 


1Locally hierarchical programs were studied in [Cavedon, 1989]. It was shown in 
[Seda and Hitzler, 1999a] that it is possible to compute all partial recursive functions with 
locally hierarchical programs under SLDNF-resolution if the use of the meta-logical cut is 
allowed. 

2 Acyclic programs were studied in [Cavedon, 1989, Cavedon, 1991] under the name of 
w-locally hierarchical programs. The notion of acyclicity was introduced in [Bezem, 1989], 
and further studies of it concerning termination properties were undertaken in [Bezem, 1989, 
Apt and Bezem, 1990]. 


142 Mathematical Aspects of Logic Programming Semantics 


Certainly, ExistsEven is somewhat pointless as a program. However, it ex- 
hibits the basic idea underlying the generate-and-test programming scheme. 
If Prolog is called with the query 


?- existsEven. 


then the interpreter successively generates all instantiations of nat(X) and 
tests for each instance of X whether or not it falls under the predicate even. 
Obviously, the generator nat and the test even could be replaced by something 
much more sophisticated. 

In ExistsEven, the subprogram consisting of the first four clauses is acyclic 
with respect to the level mapping | with | (even (s*(0))) = 1 (nat (s*(0))) =k 
for all k € N, and we notice that any level mapping with respect to which 
this subprogram is acyclic must have an infinite codomain. Consequently, 
ExistsEven is not acyclic, but it is locally hierarchical, as can be seen by 
extending the level mapping by setting I(existsEven) = w. 


We want to apply generalized metric fixed-point theorems from Chapter 4 
to acyclic and locally hierarchical programs, that is, we would like to construct 
a (generalized) metric on the set of all interpretations of a program such that 
the immediate consequence operator of the program satisfies a corresponding 
contractivity property. We follow the construction of Section 4.8.2 with a 
minor modification to suit our present purposes. 


5.1.3 Definition Let P be a normal logic program, and let 1: Bp —> y bea 
level mapping for P. We consider symbols 27% for ordinals a, and, essentially 
as in Section 4.8.2, define T; = {27% | œ < y}. The set T; is again ordered by 
27% < 2-8 if and only if 3 < a, and we denote 277 by 0. 

In Sections 4.8.2 and 4.8.3, we used this construction for gums with ordinal 
distances, and with the notation established there we have T; = r441, where 
l: Bp ¥. 

Finally, define a mapping dı : Ip x Ip + T, by setting d(I, J) = 0 if I = J, 
and, when I # J, by setting dj(I, J) = 27°, where I and J differ on some 
ground atom of level a, but agree on all ground atoms £ satisfying p < a. 


In case y = w, we can identify each 27” € IT, with the corresponding 
negative power of two, that is, 27” = x € R and 27% = 0, and then dı takes 
values in the set of real numbers. 


5.1.4 Proposition Suppose that P is a normal logic program, and that l is 
a level mapping. Then the following statements hold. 


(a) If P is locally hierarchical with respect to l, then d; is a spherically com- 
plete generalized ultrametric. 


(b) If P is acyclic with respect to l, then dı is a complete ultrametric. 


Supported Model Semantics 143 


Proof: It suffices to prove (a). We will do this by applying Theorem 4.8.14. For 
the given level mapping l, define the rank function r; by setting r;(@) = 0 and 
by setting rı(T) = max{l(A) | A € I} for every non-empty I € (Ip)e, where 
we identify each element of (Ip). with a finite subset of Bp, as usual. The 
generalized ultrametric d,, induced by 7, as in Definition 4.8.12, is spherically 
complete by Theorem 4.8.14. The mappings d; and d,., coincide since, for each 
I € Ip, we have I = sup{{A} | A € I}, with the supremum being taken with 
respect to subset inclusion. a 


Under certain conditions similar to those discussed in Section 4.6, we can 
recover the Cantor topology from dı. 


5.1.5 Proposition Let P be a normal logic program, and let l : Bp — w be 
a level mapping such that /~1(n) is finite for each n € N. Then dı induces the 
Cantor topology Q on Ip. 


Proof: It is easily shown by using Proposition 3.3.5 that sequences converge 
in Q if and only if they converge with respect to dı, and this observation 
suffices. a 


We show finally that the immediate consequence operator satisfies the re- 
quired contractivity conditions for applying the Prie8-Crampe and Ribenboim 
theorem or the Banach contraction mapping theorem, as appropriate. 


5.1.6 Theorem Suppose that P is a normal logic program, and that lis a 
level mapping. Then the following statements hold. 


(a) If P is locally hierarchical with respect to l, then Tp is a strictly contract- 
ing. 


(b) If P is acyclic with respect to l, then Tp is a contraction. 


Furthermore, in both cases, Tp has a unique fixed point, and P has a unique 
supported model. 


Proof: (a) Suppose J4, Iz € Ip, and that dj(l, I2) = 2~° for some ordinal a. 

Suppose a = 0. Let A € Tp(11) with I(A) = 0. Since P is locally hierar- 
chical, A must be the head of a unit clause in ground(P). From this it follows 
that A € Tp(I2) also. By the same argument, if A € Tp(I2) with I(A) = 0, 
then A € Tp(J,). Therefore, Tp(J,) and Tp(J2) agree on all atoms of level less 
than 1, and hence we have 


di(Tp(I1), Tp(I2)) < 27t < 2-° = di (hy, I2), 


as required. 
Now suppose a > 0, so that J; and J, differ on some element of Bp 
with level a, but agree on all ground atoms of lower level. Let A € Tp(Ii) 
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with 1(A) < a. Then there is a clause A — Aj,...,Ax,,7B1,...,7B), in 
ground(P), where k,l; > 0, such that for all k,j we have A, € I; and 
B; ¢ h. Since P is locally hierarchical and J), I2 agree on all atoms of level 
less than a, it follows that for all k, j we have Ay € Iz and B; ¢ I2. Therefore, 
A € Tp(I2). By the same argument, if A € Tp(I2) with 1(A) < a, then 
A € Tp(I,). Hence, we have that Tp(1,) and Tp(I2) agree on all atoms of 
level less than or equal to a, and it follows that 


di(Tp(1), Tp(I2)) < 27) < 27°% = d(T, b), 


as required. 
Thus, Tp is strictly contracting, and Theorem 4.3.6 yields that Tp has a 
unique fixed point and therefore that P has a unique supported model. 
The proof just given is easily adapted to establish (b). The operator Tp 
1 


turns out to be contractive with contractivity factor 5, and then Theorem 


4.2.3 is applied instead of Theorem 4.3.6. | 


5.1.7 Example Consider the program Tweetyl from Examples 2.1.2 and 
2.2.7. Tweety] is acyclic with level mapping /(penguin(X)) = 0, /(bird(X)) = 
1 and I(flies(X)) = 2 for X € {bob, tweety}. For Io = {bird(tweety)}, we 
obtain 


I, = Trweetyi Jo) = {penguin(tweety), bird(bob), flies(tweety)}, 

I> = Trweety1 (1) = {penguin(tweety), bird(bob), bird(tweety), 
flies(bob)}, and 

Is = Trweety1 (l2) = I2. 


Another example is given by the program Even (Program 2.1.3), as dis- 
cussed at the beginning of Section 5.1.1. 


5.1.2 Acceptable Programs 


Historically, acyclic programs were introduced in attempts to capture 
procedural properties, such as termination, under SLDNF-resolution, see 
[Bezem, 1989, Apt and Bezem, 1990, Cavedon, 1991]. The basic idea behind 
acyclic programs was extended to take into account the fact that logic pro- 
gramming systems, such as Prolog, evaluate clause bodies from left to right, 
and this led to the acceptable programs? studied in this section. We will focus 
on declarative aspects of acceptable programs here, generalizing the approach 
of Section 5.1.1. 


3 Acceptable programs were introduced by Apt and Pedreschi in AP94. For further read- 
ing concerning termination in resolution-based logic programming, see [Marchiori, 1996, 
Apt, 1997, Pedreschi et al., 2002]. 
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5.1.8 Definition Let P be a program, and recall from Section 2.5 that an 
atom A € Bp refers to an atom B € Bp if B or =B occurs as a body literal in 
a clause A — body in P. We say that A depends on B if the pair (A, B) is in 
the transitive closure of the relation refers to. We further denote by Neg p the 
set of predicate symbols in P which occur in a negative literal in the body of a 
clause in P, and we set Negp = Negp UD, where D is the set of all predicate 
symbols in P on which the predicate symbols in Negp depend. Finally, by 
P~ we denote the set of clauses in P whose head contains a predicate symbol 
from Neg. 

Finally, a program P is called acceptable with respect to some w-level 
mapping l : Bp — w and some interpretation I € Ip if I is a model for P whose 
restriction to the predicate symbols in Neg} is a supported model for P~, and 


the following condition holds. For each ground instance A — [y,..., Ln of a 
clause in P and for all i € {1,...,n} we have 
i-1 
if ITE A Zy then IA) > (Li). (5.1) 
j=l 


The following is an example of an acceptable program. 


5.1.9 Program Let G be an acyclic finite graph. We define the program 
Game to be the program consisting of the following clauses.4 


win(X) — move(X, Y), >win(Y). 
move(a,b) — for all (a,b) € G 


Game is not acyclic. One of the ground instances of the first clause is 
win(a) — move(a, a), -win(a), so if Game were acyclic with respect to some 
level mapping l, we would have I(win(a)) < I(win(a)), which is impossible. 
In order to show that Game is acceptable, we need to find a suitable level 
mapping l and a suitable model J for P. Since G is acyclic and finite, there 
exists a function f which assigns a natural number to every vertex of G, and 
such that for each vertex a the following holds. 


Om 0 if there is no (a,b) € G, 
~ 1+ max{f(b) | (a,b) € G} otherwise. 


We now define l by setting I(move(a,b)) = f(a) and [(win(a)) = f(a) +1 
for all vertices a,b of G. From acyclicity and finiteness of G, we furthermore 
obtain that there exists a function g mapping each vertex to {0,1} satisfying 
the following. 


(a) 0 if there is no (a,b) € G, 
a)= 
7 1 — min{g(b) | (a,b) € G} otherwise. 


4This example is taken from [Apt and Pedreschi, 1994]. For further discussion of pro- 
grams related to Game, see [Hitzler and Seda, 2003]. 
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Finally, let 
I = {move(a, ) | (a,b) € G} U {win(a) | g(a) = 1}. 


It is straightforward to verify that Game is acceptable with respect to I and 
L. 

We will now show how to construct a complete dislocated metric for any 
given acceptable program with respect to which the immediate consequence 
operator associated with the program is a contraction. For this purpose, let 
P be a program which is acceptable with respect to a level mapping l and 
an interpretation I. For any K € Ip, we denote by K’ the set K restricted 
to the predicate symbols in Neg}. Next, we define a function f : Ip — R by 
setting f(K) = 0 if K \ K’ CT and, if K \ K’ Z I, by setting f(K) = 27”, 
where n > 0 is the smallest integer such that there is an atom A € Bp with 
(A) =n, A€ K\ K’ and A € I. Now define a function u : Ip — R by setting 
u(K) = max{ f(K), di(K’, I')}, where dı is the generalized ultrametric from 
Definition 5.1.3. 

Finally, for all J, K € Ip, we set® 


o( J, K) = max{d)(J \ J’, K \ K), u(J), u(K)}. 
Thus, for all J, K € Ip, we have 
o( J, K) = max{di(J \ T, K\ KE’), FS), d(T), f(K), aK, Py}. 


We apply Proposition 4.8.7 in order to show that og is a complete dislocated 
ultrametric. We will need the following lemma. 


5.1.10 Lemma Let u(K) = max{ f(K),d;(K’,I’)} for K € Ip. Then u is 
continuous as a function from (Ip, d;) to R. 


Proof: Let Km be a sequence in Ip which converges in dı to some K € Ip. 
We need to show that d)(K/,,, I’) converges to d;(K’,I’) and that f(Km) 
converges to f(K) as m — oo. Since (Km) converges to K with respect to 
the metric dı, it follows that for each n € N there is mp € N such that, for 
all m > Mn, K and Km agree on all atoms of level less than or equal to n. 
Suppose that f(K) = 27", say, and that m > Mno. Then Km and K agree 
on all atoms of level less than or equal to nọ, and it follows that K’ and Ky, 
agree on all atoms of level less than or equal to no and, hence, that K \ K’ 
and Km \ K}, agree on all atoms of level less than or equal to no. Therefore, 
we have f(Km) = 27" = f(K) for all m > Mno. Also, if dj(K’, I’) = 27”, 
say, then d)(K},,, I!) = 27" = d,(K', 1’) for all m > Mno- 

The result now follows. a 


It remains to show that Tp is a contraction with respect to o. 


5This approach was inspired by [Fitting, 1994b]. The function u is usually called a weight 
function if it is used for constructing dislocated metrics from metrics, see [Matthews, 1992, 
Waszkiewicz, 2002]. Here, and in Section 5.1.3, we follow [Seda and Hitzler, 2010]. 
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5.1.11 Theorem Let P be a program which is acceptable with respect to 
some level mapping l and interpretation J. Then o is a complete dislocated 
ultrametric, and Tp is a contraction with respect to o. In particular, P has a 
unique supported model M and M = lim T} (Io) for any Ip € Ip. 


Proof: The mapping o is a complete dislocated ultrametric by Lemma 5.1.10 
and Proposition 4.8.7. By Matthews’ theorem, Theorem 4.4.6, it remains to 
show that Tp is a contraction with respect to o. The argument for this is 
essentially the same as the slightly more general one in the proof of Theorem 
5.1.14, to be given in the next section, so we omit it here. | 


5.1.3 &*-Accessible Programs 


We have seen in Section 5.1.2 that application of the Banach contraction 
mapping theorem can be replaced by application of Matthews’ theorem when 
passing from acyclic to acceptable programs. Likewise, the Priess-Crampe and 
Ribenboim theorem can be used in place of Banach’s theorem when passing 
from acyclic to locally hierarchical programs. Naturally, the question arises 
as to whether or not a class of programs can be described which generalizes 
both the acceptable and the locally hierarchical programs such that Theorem 
4.5.1, which generalizes both Matthews’ theorem and the Priess-Crampe and 
Ribenboim theorem, can be applied. We will describe such a class of programs 
in this section. 


5.1.12 Definition A program P is called ®*-accessible® if and only if there 
exists a level mapping l for P and a model I for P whose restriction to Negp 
is a supported model for P~ such that the following condition holds. For each 
clause A — Ly,...,£, in ground(P), either we have I = Li A++- A Ln and 
L(A) > 1(£;) for alli = 1,...,n or there exists i € {1,...,n} such that I j Li 
and I(A) > I(L;). 


As an example, we refer again to the generate-and-test scheme described 
in Program 5.1.2. 


5.1.13 Program Assume that the unary predicate symbols generate and 
test are defined via acceptable programs P; and P2, and consider the program 
P which is the union of P,, Py and the following clause. 


success + generate(X),test(X). 


It is easy to see that P is 6*-accessible: first note that P, and P are ®*- 
accessible with respect to models J; and Ip and level mappings lı and l2, say, 
with codomain w. We can assume without loss of generality that Bp, and Bp, 


6It was shown in [Hitzler and Seda, 2003] that it is possible to compute all partial re- 
cursive functions with definite ®*-accessible programs using SLD-resolution. 
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are disjoint and do not contain success. Now define I = I, U In U {success} 
and define 1: Bp > w + 1 by I(A) =1,(A), if A € Bp,, and I(success) = w. 
Then P is easily seen to be ®*-accessible with respect to I and J. 


We continue to carry over the approach of Section 5.1.2; again, we follow 
[Seda and Hitzler, 2010]. So let P be a program which is ®*-accessible with 
respect to a level mapping | : Bp — y and an interpretation J. For any 
K € Ip, we again denote by K’ the set K restricted to the predicate symbols 
in Neg. Again, we define a function f on Ip, this time taking values in I), 
by setting f( kK) =0 if K\ K’ C I and, if K\K' £ I, by setting f(K) = 27°, 
where a is the smallest ordinal such that there is an atom A € Bp with 
(A) =a, A€ K\K' and A¢ I. Now define a function u : Ip — T; by again 
setting u(K) = max{ f (K), d(K', T')}, where d; is the generalized ultrametric 
from Definition 5.1.3. 

Finally, for all J, K € Ip, we set 


o(J, K) = max{di(J \ J’, K \ K’) u(J), u(K)} 
as before. Thus, for all J, K € Ip, we have 
ol J, K) = max{di(J \ JT, K\ KB’), f(J), a(S T), f(K), da(K',T)}- 


In fact, the details of the proof of the main result below will be simplified 
by introducing the functions dı and d2, where, for all J, kK € Ip, we set 
di(J, K) = d(J', K’) and d2(J, K) = di(J\ J’, K\ K’). Indeed, in these terms 
we have 


o( J, E) = max{d;(J,I),di(K,1),da(J, K), f(J), f(K)} 
for all J, kK € Ip. 


5.1.14 Theorem Let P be a ®*-accessible normal logic program. Then the 
space (Ip,@) is a spherically complete, dislocated generalized ultrametric 
space, and Tp is strictly contracting with respect to o. In particular, P has a 
unique supported model. 


Proof: It follows from Proposition 4.8.22 that o is a dislocated generalized 
ultrametric. For spherical completeness, let (Ba) be a (decreasing) chain of 
balls in Ip with centres Ia. Let K be the set of all atoms which are eventually 
in Ia, that is, the set of all A € Bp such that there exists some ordinal 8 with 
A € Ía for all a > 3. We show that for each ball Bj-a(J,,) in the chain, we 
have dı(Ia, I) < 27%, which suffices to show that K is in the intersection of 
the chain. Indeed, it is easy to see by the definition of @ that all Ig with 8 > a 
agree on all atoms of level less than a. Hence, by definition of K we obtain 
that K and I, agree on all atoms of level less than a, as required. 

It remains to show that Tp is strictly contracting with respect to o, for it 
will then follow from Theorem 4.5.1 that the operator Tp has a unique fixed 
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point, yielding a unique supported model for P. In order to show that Tp is 
strictly contracting with respect to o, we must show that for all J, kK € Ip 
with J # K we have o(Tp(J),Tp(K)) < o(J, K). In particular, the following 
results hold. 


(a) di(Tp(J),D) < dı(J,I) whenever di(J,I) # 0, and di(Tp(J),I) = 0 
whenever d(J,I) = 0. 


(b) £(Te(J)), F(Te(K)) < (J, K). 
(c) d2a(Tp(J),Tp(K)) < o(J, K). 


Indeed, it suffices to prove properties (a), (b) and (c), and we proceed to do 
this next. For convenience, we identify Neg} with the subset of Bp containing 
predicate symbols from Neg’p. 

(a) First note that d\(Tp(J), I) = di(Tp-(J),I) since dı only depends 
on the predicate symbols in Neg. Let dı(J,I) = 27%. We show that 
di(Tp-(J), I) < 2-(¢+), We know that J’ and I’ agree on all ground atoms 
of level less than a and differ on an atom of level a. It suffices to show now 
that Tp-(J)/ and I’ agree on all ground atoms of level less than or equal to 
a. 

Let A be a ground atom in Neg» with I(A) < a, and suppose that Tp- (J) 
and I differ on A. Assume first that A € Tp-(J) and A ¢ I. Then there 
must be a ground instance A — L1,..., Lm of a clause in P7 such that 
J H| L1 ^- -A Lm. Since I is a fixed point of Tp-, and using Definition 5.1.12, 
there must also be a k such that I jÆ Ly and I(L,) < a. Note that the 
predicate symbol in Lẹ is contained in Neg’. So we obtain I - Ly, J = Ly 
and (Lk) < a, which is a contradiction to the assumption that J and I 
agree on all atoms in Neg’, of level less than a. Now assume that A € I and 
A ¢ Tp-(J). It follows that there is a ground instance A — [y,..., Lm of 
a clause in P~ such that I = Li A+++ A Lm and I(A) > I(Ly),...,U(Lm) by 
Definition 5.1.12. But then J = Lı A---A Lm since J and I agree on all 
atoms of level less than a and consequently A € Tp-(J). This contradiction 
establishes the first statement in (a). The second statement in (a) follows by 
a similar argument, noting that in this case J’ = I’. 

(b) It suffices to show this for K. Assume o(J, K) = 2~%. We show that 
f(Tp(K)) < 2-°+, for which, in turn, we have to show that, for each 
A € Tp(K) not in Neg with 1(A) < a, we have A € I. Assume that A ¢ I 
for such an A. Since A € Tp(K), there is a ground instance A — Ly,..., Lm 
of a clause in P with K = Lı A---A Lm. Since A ¢ I, there must also be a 
k with I j Ly and L(A) > (Lp) by Definition 5.1.12. If the predicate symbol 
of Ly belongs to Neg, then, since K and I agree on all atoms in Neg of 
level less than a, we obtain K 4 Ly, which contradicts K = Li A+- A Lm. 
If the predicate symbol in Ly does not belong to Neg, then Lẹ is an atom, 
and since f(K) < 2~%, we obtain I } Lx, which is again a contradiction. 

(c) Let o( J, K) = 27%, and let A be not in Neg> with I(A) < a and 
A € Tp(J). By symmetry, it suffices to show that A € Tp(K). Since A € 
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Tp(J), we must have a ground instance A — Ly,...,L of a clause in P 
with J = Li Ace A Lm. If I H Ly A+++ A Lm, then ILe) < I(A) < a for 
all k, and since J and K agree on all atoms of level less than a, we obtain 
K H L,A---ALm, and hence A € Tp(K). If there is some Ly such that I K Lp, 
then without loss of generality I(Z,) < (A) < a by Definition 5.1.12. Now, 
if the predicate symbol of Lẹ belongs to Neg, then, since dı(J, I) < 27%, 
we obtain from J = Ly that I H Lk, which is a contradiction. Also, if the 
predicate symbol of Ly, does not belong to Neg, then Lẹ is an atom, and 
since f(J) < 27%, we obtain I = Lk, again a contradiction. This establishes 
(c) and completes the proof. a 


5.1.4 &-Accessible Programs 


Definition 5.1.12 of ®*-accessibility is obviously related to the level map- 
ping characterization of the Fitting semantics given in Section 2.4. In the 
present section, we will carry over the approach from Section 5.1.3 to programs 
with a total Fitting model, and we refer the reader to [Hitzler and Seda, 2003] 
for further details. The relationships between the different classes of programs 
studied so far in this chapter will be further clarified in Section 5.2. 


5.1.15 Definition A program is called ®-accessible if it has a total Fitting 
model. 


By Corollary 2.4.10, a program P is ®-accessible if and only if there is a 
(two-valued) model J and a (total) level mapping l for P such that P satisfies 
(F) with respect to I U ~(Bp \ I) and I. The restriction of I to Neg is 
then a supported model for P~, and it follows easily that every ®*-accessible 
program is ®-accessible. However, the development of Section 5.1.3 does not 
generalize without modifications, as the following example shows. 


5.1.16 Program Let P be the following program. 


p(s*(x)) = pla) 


p(0) — 
p(s*(0)) — p(s?(0)) 
p(s?(0)) — p(s°(0)) 


The program P is ®-accessible (and even definite) with respect to the model 
Bp = {p(s"(0)) | n € N} and the level mapping l : Bp — N defined by 
I(p(s”(0))) = n. Using the dislocated generalized ultrametric 9 from Section 
5.1.3, we obtain for K = {p(s°(0))} and J = {p(s3(0))} that o(K, J) = 273 
and 0(Tp(K),Tp(J)) = 277; thus, Tp is not a contraction relative to o. 


We will modify the methods used in Section 5.1.3 by means of Proposition 
4.8.23. 
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5.1.17 Theorem Let P be a ®-accessible program with model J and level 
mapping l such that P satisfies (F) with respect to I U ~(Bp \ I) and l. 
Then Tp is strictly contracting on the spherically complete dislocated gener- 
alized ultrametric space (Ip, o), where for all J, K € Ip we have o(J, K) = 
max{d)(J, I), d(I, K)}. In particular, P has a unique supported model. 


Proof: By Proposition 4.8.23, we have that (Ip, o) is a spherically complete 
dislocated generalized ultrametric space. 

In order to show that Tp is strictly contracting, let J, K € Ip, and assume 
that o(J, K) = 27°. Then J, K and I agree on all ground atoms of level less 
than a. We show that Tp(J) and J agree on all ground atoms of level less 
than or equal to a. A similar argument shows that Tp(K) and I agree on all 
ground atoms of level less than or equal to a, and this suffices. 

Let A € Tp(J) with L(A) < a. Then there must be a clause A — L1,..., Ln 
in ground(P) such that J H Lı A---A Ln. Since I and J agree on all ground 
atoms of level less than a, (Fii) cannot hold, because if I jÆ L; with I(A) > 
I(L;), then J jÆ Li and consequently J j£ £1 /A---ALn, which is a contradiction. 
Therefore, (Fi) holds, and so A € Tp(I) = I. Hence, A € I. 

Conversely, suppose that A € I. Since J = Tp(J), there must be a clause 
A<— Lı,..., Ln in ground(P) such that I = Li A++- A Ln. Thus, (Fi) must 
hold, and so we can assume that A <— L1,..., Ln also satisfies 1(A) > I(L;) 
for i =1,...,n. Since J and J agree on all ground atoms of level less than a, 
we have JE Li A---A Ln, and hence A € Tp(J), as required. 

Applying Theorem 4.5.1 now yields a unique fixed point M of the operator 
Tp, that is, a unique supported model for P. a 


The proof of Theorem 4.5.1 yields, moreover, that there must be an ordinal 
a such that o0(M,M) = 0. Since the only point of X which has non-zero 
distance from itself is J, we conclude that J = M is the unique supported 
model for P. This is somewhat unfortunate,’ since J was needed in order to 
construct o. 


5.2 Three-Valued Supported Models 


Recall from Section 2.4 that the three-valued supported models for a pro- 
gram P are exactly the fixed points of the corresponding Fitting operator, 
while the least fixed point of the operator, that is, the least three-valued sup- 
ported model for the program, is called its Fitting model. In this section, we 
will study variants of the Fitting operator and relate them to the classes of 
programs studied in Section 5.1. Thus, in the present section, unless otherwise 


TWe have argued in [Hitzler and Seda, 2003] that self-distance can be understood as a 
measure of a priori knowledge, but this needs to be substantiated further. 
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noted, interpretations will be three-valued, and therefore Ip here means Ip3 
ordered using the knowledge ordering introduced in Section 1.3.2. 


5.2.1 Fitting Operators Revisited 


We begin with an alternative characterization of the Fitting operator, 
which is amenable to generalization in various logics. It involves a program 
transformation which we will introduce next. Later on in Section 5.4, we will 
consider further generalizations of Fitting operators, called Fitting-style op- 
erators, see Definition 5.4.9. 

Let P be a program and suppose that A € Bp is the head of some clause 
in ground(P). Now let {A — body, | i € A} be the set of all clauses with head 
A in ground(P), where A is a suitable index set. We call A — Veca body, 
the pseudo-clause associated with A, we call body, = \/;-, body, the body 
of the pseudo-clause, and we call A its head. As a matter of notation, we 
may sometimes denote body, by C;, and hence we may sometimes denote by 
A + Vien Ci or even more simply by A + V C; the pseudo-clause associated 
with A. 

Notice that the family {body,; | i € A} of bodies may be denumerable and 
that Vica body, is formal at this stage. Nevertheless, we next assign truth 
values to bodies of pseudo-clauses with respect to an interpretation in cer- 
tain three-valued logics® and in more generality in Theorem 5.5.1 and in Sec- 
tion 7.6. If Vic} body, is such a body, then body, is a (finite) conjunction for 
any i and can be evaluated as usual by means of truth tables for conjunction. 
We will consider three different conjunctions and two different disjunctions, 
all as given by the truth tables in Table 5.1 on Page 153. Note that Ai and V1 
are exactly the conjunction and disjunction from Kleene’s strong three-valued 
logic, specified earlier as a sublogic of Belnap’s logic in Table 1.1, and already 
employed in Section 2.4 in evaluating truth values of clause bodies. 

With respect to V1, a disjunction p V1 q is false if and only if both p and 
q are false, is true if and only if one of p and q is true, and is undefined 
otherwise. We use this as a definition of truth values for bodies of pseudo- 
clauses. Therefore, with respect to Vj: 


the body \V,¢, body, of a pseudo-clause is false if and only if 
all of the body, are false, is true if and only if one of the body, 
is true, and is undefined otherwise. 


With respect to V2, a disjunction p V2 q is false if and only if both p and q 
are false, is undefined if one of p and q is undefined, and is true otherwise. 
Therefore, with respect to V2: 


the body V,¢, body, of a pseudo-clause is false if and only if 


8Strictly speaking, we discuss different truth tables for logical connectives — or rather 
different connectives — over three truth values over the same underlying language. It will be 
convenient to think in terms of different logics, however. 
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TABLE 5.1: Several truth tables for three-valued logics. 


all of the body, are false, is undefined if and only if one of the 


p q|pM4 Pq p^za 
u u u u u 
u f f u u 
u t u u u 
f u f f u 
f f f f f 
f t f f f 
t u u u u 
t f f f f 
t t t t t 
p q|pVıq4 PV2q 
u u u u 
u f u u 
u t t u 
f u u u 
f f f f 
f t t t 
t u t u 
t f t t 
t t t t 
P| P 
uj u 
f | t 
t | f 


body, is undefined, and is true otherwise. 


153 


Finally, if A is an atom which does not appear as the head of a clause in 
ground(P), then we say, by abuse of notation, that A — V;eg body, is the 
pseudo-clause associated with A, and we take Vcg body, to be false both 
with respect to Vı and with respect to V2. Notice now that every element 
A of Bp is the head of the pseudo-clause associated with A and that this 


pseudo-clause is uniquely determined by P for a given A. 


The following notation will be convenient. Let A € Bp, let body, be the 
body of the pseudo-clause associated with A, and let I be a three-valued inter- 
pretation. Then write I; (body ,) for the truth value, under J, of body , with 
respect to A; and Vz, for j = 1,2,3 and k = 1,2. The following proposition 


follows easily from the definitions. 
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5.2.1 Proposition Let P be a program, let J be a three-valued interpretation 
for P, and let A € Bp. Then ®p(I)(A) = Iı (body 4), that is, the truth value 
of A under ®p(J) is exactly [1,1 (body ,). 


The logics from Table 5.1 give rise to different operators.’ 


5.2.2 Definition Let P be a program. For any j = 1,2,3 and any k = 1,2, 
we define an operator ®p jx : Ip3 > Ip3 by ®pj.4(1)(A) = Ij p(body 4). 


We can now rephrase Proposition 5.2.1 by saying that the operators ®p 14 
and ®p coincide. The following proposition lists properties of the ®p j,k- 
operators. We use the notation of three-valued interpretations as signed sets, 
see Section 1.3.3, and of two-valued interpretations as subsets of Bp. 


5.2.3 Proposition Let P be a program, and let J,J,k € Ip3. Then the 
following hold. 


a) p;p is monotonic for j = 1,2,3 and k = 1,2. 
KE J 

( 

(c Op j2 I) C pjl) for J = T23: 


) 

b) Ppa klI) C ®p24(J) C piklK) for k = 1,2 if I C J = K. 
) ( 

(d) ®pj2(T)” = ®pji(Z). 


Proof: (a) The proof of this statement is very similar to that of Proposition 
2.4.4 and is therefore omitted. 

(b) From the truth tables, it follows that for all A € Bp and each k € {1,2} 
we have Iz (body 4) C Jo4,(body 4) C K1,(body,), and this suffices. 

(c) From the truth tables, we obtain I;,2(body,) C Ij ı(body4) for all 
AEBp. 

(d) By (c), it suffices to show that ®pjo(1)~ D @®pj1(1)~. So let A € Bp 
be such that Ij ı(body4) = Ij (Vica body;) = f. Then I;,1(body,) = f for 
all i, and hence J;,(body,;) = f for all i. Consequently, Ij 2(body4) = f, as 
required. | 


Proposition 5.2.3 shows that the operators are “nested” and that ®p1 1 = 
®p is the least sceptical of them. In particular, for each ordinal a and all j 
and k, the following hold. 


Pp3rtTaCPpartaC pikta 
Ppj2la C Opjita 


We can also relate ®p to the two-valued immediate consequence operator, Tp, 
thereby extending Proposition 2.4.13. 


9We refer to the papers [Hitzler and Seda, 1999a, Hitzler and Seda, 2002b] for further 
details concerning the results of this section. 
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5.2.4 Lemma Let P be anormal logic program, let I € Ip», and let K € Ip3 
be such that Kt CIC Bp \ K-. Then ®p(K)t G Tp(I) C Bp \ Dp(K)~. 
Furthermore, if Kt = I = Bp \ K~, so that K is total, then ®p(K)* = 
Tp(I) = Bp \ ®p(K)-. 


Proof: Suppose that A € ®p(K)t. Then A must be the head of a clause 
A Aj,...,Ag,,7B1,...,7Bk, in ground(P) with A; € Kt and B; € K- 
for all i = 1,..., kı and j =1,...,k2. By assumption, it follows that for these 
values of i and j, A; € J and B; ¢ I, and hence A € Tp(J). 

For the second inclusion, it suffices to show that ®#p(K) C Bp \ 
Tp(I). Let A € @p(K)~. Then, for every clause of the form A <— 
A,...,Ar,,7B1,...,7B,, in ground(P), we have some A; € K7 or some 
B; € K+. Hence, for every such clause, we have some A; ¢ I or some B; € J, 
which implies that A ¢ Tp(J). 

The final statement was established in Proposition 2.4.13. a 


The following straightforward corollary provides the essential link between 
the ®-operator, the single-step operator Tp, and convergence in Q. 


5.2.5 Corollary Let I„ = T}(I) for some I € Ip2, and let Kn = ®p În. 
Then, for all n € N, we obtain K} CI, C Bp\ K3. 


The following is a direct consequence of Lemma 5.2.4. 


5.2.6 Proposition Let P be a normal logic program, and let (I*+,I~) be a 
total three-valued interpretation J for P. Then J is a fixed point of ®p if and 
only if I* is a fixed point of Tp. Furthermore, if @p has exactly one total 
fixed point M, then M* is the unique fixed point of Tp. 


Proof: Let I be a fixed point of 6p. Then It C I* C Bp \ I~, and by 
Lemma 5.2.4 we obtain I+ = ®p(I)+ C Tp(I+) C Bp\ ®p(I)- = Bp\I- = 
I*. Conversely, let I+ be a fixed point of Tp. By Lemma 5.2.4, we obtain 
®p(I)t = Tp(It) wn Ge Bp \ I~ = Bp \ Pp(1)-, and therefore ®p(I)* = 
I* and ®p(I)~ = I~. The last statement now follows immediately. a 


Convergence of iterates with respect to the Cantor topology can now be 
described, as follows. 


5.2.7 Proposition Let P be a normal logic program, and assume that M = 
p fw is total. Then TZ(Ø) converges in Q to M+, and M* is the unique 
supported model Mp for P. 


Proof: Using the notation from Corollary 5.2.5, we obtain Mt = |J K} and 
M- = (J K}. Since M is total, we obtain from Propositions 3.3.5 and 5.2.6 
that MT is the limit in Q of the sequence In. Since totality of ®p Tw implies 
that it is the unique fixed point of ®p, it therefore equals (M+, M7), so that 
M* is the unique fixed point of Tp by Proposition 5.2.6. a 
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Proposition 5.2.7 allows us to apply Theorem 4.2.5 in the following way. 
Let P be a normal logic program such that @p fî w is total. Then T#(J) 
converges in Q to ®p tw for every I, and ®pfw is the unique fixed point of 
Tp. By Theorem 4.2.5, we can therefore find a metric with respect to which 
Tp is a contraction. However, this metric does not in general coincide with 
the metric associated with the dislocated ultrametric o from Theorem 5.1.17, 
with respect to which Tp is also a contraction under the given condition on 
P: 

The following result is even stronger than Proposition 5.2.7. 


5.2.8 Theorem Let P be a normal logic program, let j € {1,2,3}, let k € 
{1,2}, and assume that M = ®p j | @ is total for some a. Then M* is 
the unique two-valued supported model for P. Furthermore, the transfinite 
sequence (®p j,k T 3) converges in the Cantor topology to Mt. 


Proof: By totality of M, Propositions 5.2.3 and 5.2.6, we obtain M* as a 
fixed point of Tp. The convergence results follow as in Proposition 5.2.7. E 


We can extend the treatment of the Fitting operator from Section 2.4 to 
the operators ®p jẹ introduced in Definition 5.2.2. This will, in turn, lead 
us back to the program classes from Section 5.1. We begin with the ®p 3,2- 
operator in the next section. 


5.2.2 Acyclic and Locally Hierarchical Programs 


We first present conditions analogous to Definition 2.4.8, which was used to 
characterize the Fitting semantics, beginning with condition (F32) as defined 
next. 


5.2.9 Definition Let P be a normal logic program, let J be a model for P, 
and let l be an J-partial level mapping for P. We say that P satisfies (F32) with 
respect to I and 1 if for each A € dom(l) and for all clauses A — Ly,..., Ln 
in ground(P) we have L; € dom(l) and I(A) > I(L;) for alli =1,...,n, and 
furthermore each A € dom(l) satisfies one of the following conditions. 


(Fi) A € I, and there is a clause A — L1,..., Ln in ground(P) such that 
L; € I and L(A) > I(L;) for all i. 


(Fii) 4A € J, and for each clause A — Ly,..., Ln in ground(P) there exists 


Conditions (Fi) and (Fii) are identical to those in Definition 2.4.8. The dif- 
ference between Definitions 2.4.8 and 5.2.9 lies in the additional very strong 
condition “for each A € dom(l) and for all clauses A — Ly,...,L, in 
ground(P) we have L; € dom(l) and /(A) > I(£,) for all i = 1,...,n”. The 
proof of the following theorem is very similar to the proof of Theorem 2.4.9 
and is therefore only sketched. 
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5.2.10 Theorem Let P be a normal logic program, and let M be the least 
fixed point of the operator ®p 3,2. Then, in the knowledge ordering, M is the 
greatest model among all three-valued models J for which there exists an I- 
partial level mapping l for P such that P satisfies (F32) with respect to J and 
l. 


Proof: Let Mp be the least fixed point of the operator @p3 2, and define 
the Mp-partial level mapping lp as follows: [p(A) = a, where a is the least 
ordinal such that A is not undefined in ®p f? (a+ 1). The proof proceeds by 
established the following facts. (1) P satisfies (F32) with respect to M and lp. 
(2) If I is a three-valued model for P, and l is an J-partial level mapping such 
that P satisfies (F32) with respect to I and l, then I C Mp. 

(1) Let A € dom(lp), and suppose that lp(A) = a. We consider two cases. 

Case i. If A € Mp, then Table 5.1 together with the definition of lp yields 
that A satisfies (Fi) with respect to Mp and lp. It also yields that I(L) < a 
for each literal L in the body of any clause from ground(P) with head A. 

Case ii. If aA € Mp, then again Table 5.1 together with the definition of 
lp yields that A satisfies (Fii) with respect to Mp and Ip. As before, it also 
yields I(L) < a for each literal L in the body of any clause from ground(P) 
with head A. This completes the proof of (1). 

(2) Similarly to the proof of Step (2) in the proof of Theorem 2.4.9, it can 
be shown via transfinite induction on a = L(A) that: whenever A € I we have 
A € ®p32)(a@+1) and whenever ~A € I we have =A € ®p3 2} (a+1). This 
concludes the proof. | 


5.2.11 Corollary A logic program P is acyclic if and only if p39 7 w is 
total, and is locally hierarchical if and only if ®p3. Î a is total for some 
ordinal a. 


Proof: Let P be such that ®p3 7a is total for some a. Then by Theorem 
5.2.10 and Definition 5.2.9 it follows that P is locally hierarchical with respect 
to the level mapping lp as defined in the proof of Theorem 5.2.10. 

Conversely, let P be locally hierarchical with level mapping l. Then, by 
Theorem 5.1.6, P has a unique supported model M, that is, M is the unique 
fixed point of the operator Tp. We show that P satisfies (F32) with respect to 
I =MU-(Bp \ M) and l. For this it suffices to show that for each A € Bp, 
conditions (Fi) and (Fii) hold with respect to I. This, however, is an immediate 
consequence of the fact that M is a fixed point of Tp and that P is locally 
hierarchical. 

The argument to show that P is acyclic if and only if ®p3 Tw is total is 
similar. a 
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5.2.3 Acceptable Programs 


The treatment of Section 5.2.2 carries over to acceptable programs with 
only minor modifications. Given a program P, an interpretation I € Ip3, and 
an I-partial level mapping l, we say that a clause A — [y,..., Ln is k-safe 
(with respect to I and l) if either L1,...,Ln € I and I(A) > U(L;) for all 
t=1,...,nor =L € I, Ly,...,L,-1 E€ I and I(A) > 1(L;) for alli = 1,..., k. 
This notion generalizes condition (5.1) in Definition 5.1.8 in the following 
sense: a program P is acceptable with respect to some w-level mapping l and 
some interpretation I € Ip. if and only if I is a model for P whose restriction 
to the predicate symbols in Neg is a supported model for P7, and for each 
clause in ground(P) there exists k such that the clause is k-safe (with respect 
to TU ~(Bp \ T) and 1). 


5.2.12 Definition Let P be a normal logic program, let J be a model for P, 
and let l be an I-partial level mapping for P. We say that P satisfies (F22) 
with respect to I and lif, for each A € dom(l) and for all clauses in ground(P) 
with head A, there exists k such that the clause is k-safe, and furthermore, 
each A € dom(I) satisfies one of the following conditions. 


(Fi) A € I, and there is a clause A — Ly,..., Ln in ground(P) such that 
L; € I and L(A) > I(L;) for all i. 


(Fii) 4A € J, and for each clause A — Ly,..., Ln in ground(P) there exists 
i with =L; € I and 1(A) > (L;). 


The proof of the following theorem is very similar to the proof of Theorem 
5.2.10 and is therefore omitted. 


5.2.13 Theorem Let P be a normal logic program and let M be the least 
fixed point of the operator ®p 2,2. Then, in the knowledge ordering, M is the 
greatest model among all three-valued models J for which there exists an T- 
partial level mapping l for P such that P satisfies (F22) with respect to J and 
L. 


5.2.14 Corollary A normal logic program P is acceptable if and only if 
®po22fw is total. 


Proof: Let P be such that ®p22fw is total. From Theorem 5.2.8 we know 
that P has a unique supported model whose restriction to predicate symbols 
in Neg’ is a supported model for P~. By Theorem 5.2.10 and Definition 5.2.9, 
it easily follows that P is acceptable. 

The proof of the converse is similar to that of Corollary 5.2.11. a 
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5.2.4 6*-Accessible Programs 


We next give the analogue of Definition 5.2.9 for ®*-accessible programs. 
In order to make it more concise, we have chosen to rearrange the statements 
of the conditions slightly. The reader will easily identify the parts which cor- 
respond to conditions (Fi) and (Fii). 


5.2.15 Definition Let P be a normal logic program, let J be a model for P, 
and let l be an J-partial level mapping for P. We say that P satisfies (F12) with 
respect to I and l if for each A € dom(l) and for all clauses A — L1,..., Ln 
one of the following conditions (F121), (Fizii) holds. Furthermore, if A € J, 
there must be at least one clause which satisfies (F121), and if ~A € J, there 
must be no clauses which satisfy (F121). 


(Fygi) L; € I and 1(A) > I(L,) for all i. 
(Fı2ii) There exists i with =L; € I and 1(A) > I(L;). 


The proof of the following theorem is very similar to the proof of Theorem 
5.2.10 and is therefore omitted. 


5.2.16 Theorem Let P be a normal logic program, and let M be the least 
fixed point of the operator ®pj.2. Then, in the knowledge ordering, M is the 
greatest model among all three-valued models J for which there exists an I- 
partial level mapping l for P such that P satisfies (F12) with respect to J and 
L. 


The proof of the following corollary is similar to the proof of Corollary 
5.2.14 and is therefore omitted. 


5.2.17 Corollary A normal logic program P is ®*-accessible if and only if 
®p1,2fa is total for some ordinal a. 


5.2.5 -Accessible Programs 


Results for -accessible programs corresponding to those for ®*-accessible 
programs in Section 5.2.4 have already been obtained, and we refrain from 
repeating them here. Theorem 5.2.16 finds its analogue in Theorem 2.4.9, and 
the analogue of Corollary 5.2.17 can be found in Definition 5.1.15. 


5.3 A Hierarchy of Logic Programs 


In Figure 5.1, we present an overview of the relationships between the 
main classes of normal logic programs discussed in this book. Note that dif- 
ferent branches of the graph shown are not necessarily disjoint. For example, 
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FIGURE 5.1: The main classes of programs discussed in this book. The arrows 
indicate class inclusion. See the main text of Section 5.3 for further details. 


a program can be locally hierarchical without being acyclic, but still have a 
@-continuous immediate consequence operator, meaning that its immediate 
consequence operator is continuous in the topology Q. 

Covered programs are defined in Definition 7.5.4. Figure 5.1 indicates that 
every acyclic program is covered, but note that this is only the case if we 
assume that the underlying language contains at least one function symbol. 
Indeed, if this is not the case, then the Herbrand base is finite, and, for ex- 
ample, the program P with the single clause 


q(a) — p(z) 


is acyclic!® but not covered. Q-continuity of the immediate consequence oper- 
ator for covered programs follows from Corollary 5.4.8. Scott continuity of the 
immediate consequence operator for definite programs follows from Theorem 
2.2.3 (it was called order continuity there). 

The remaining relationships shown in Figure 5.1 follow from results in 
Chapter 2 and Section 5.2. 


10For example, assume a is the only constant symbol. Then ground(P) is q(a) — p(a), 
and so P is obviously acyclic. 
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5.4 Consequence Operators and Fitting-Style Operators 


We close this section by discussing some natural extensions of certain ear- 
lier results. These are obtained by defining a rather general semantic operator 
T modelled on Fitting operators, but defined over abstract finite logics T 
rather than over logics containing two, three, or four elements. We call the 
resulting operators consequence operators, and an important special case of 
them we call Fitting-style operators. Our main result here is a careful anal- 
ysis of the continuity of these operators T in the Cantor topology Q, which 
yields necessary and sufficient conditions for the continuity in Q of the single- 
step operator as a special case, see Theorem 5.4.11. Once these results are 
established, the aforementioned extensions we require are straightforward to 
present. 

Thus, let 7 denote a finite set {t1,...,t,} of truth values containing at 
least the two distinguished values tı and tn, which are interpreted as being 
the truth values for “false” and for “true”, respectively. We assume that we 
have truth tables for the usual connectives V, A, —, and 7. Given a normal 
logic program P, we denote the set of all (Herbrand) interpretations or valu- 
ations in this logic by Ip n; thus, Ip, is the set of all functions I: Bp > T. 
If n is clear from the context, we will use the notation Ip instead of IPn, 
and we note that this usage is consistent with that already established for 
n = 2,3, and 4. As usual, any interpretation J can be extended, using the 
truth tables, to give a truth value in 7 to any variable-free formula in the 
language £ underlying P. We assume throughout this section that our un- 
derlying language £ contains at least one function symbol, and hence Bp is 
denumerable. Finally, we endow Ip n with the Cantor topology Q studied in 
Chapter 3, see Theorem 3.3.1, and recall that this is the product topology of 
Bp copies of the discrete topology on T. We refer the reader to Theorem 3.3.4 
and Proposition 3.3.9 for a summary of the properties of Q. We note that our 
present assumption that Bp is denumerable and that T is finite mean that Q 
is second countable. 

We proceed next with introducing a rather general notion of semantic oper- 
ator T which subsumes many of the particular operators we have encountered 
in the earlier chapters. As already noted, our main objective here is to study 
the continuity of T in the topology Q.!! 


5.4.1 Definition An operator T on Ip is called a consequence operator for 
P if for every I € Ip the following condition holds: for every clause A — body 
in ground(P), where T(I)(A) = t; and I(body) = tj, say, we have that the 
truth table for — yields the truth value tn, that is, true for t; — tj. 


11We refer the reader to [Hitzler et al., 2004] for further details concerning the material 
of this section. 
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It turns out that this notion of consequence operator relates nicely to Q, 
yielding the following result. 


5.4.2 Theorem If T is a consequence operator for P and if for any I € Ip 
we have that the sequence of iterates T™ (I) converges in Q to some M € Ip, 
then M is a model for P in the sense that every clause in ground(P) evaluates 
to tn under M. Furthermore, continuity of T yields that M is a fixed point of 
T. 


Proof: Suppose that A € Bp and that M(A) = t;, and let A — body belong 
to ground(P), where body has the form Aj,...,Am,7B1,...,7Bm. Then 
eventually T(T*(I))(A) = ti. Suppose M(AiA...A AmA7BLA...A7Bm’) = 
tj, say. Taking the sequence T*(I), we have, by the property stated in the 
hypothesis (applied to each literal in the conjunction under consideration), 
that eventually T*(I)(AiA...\ Am A7BiA...A7Bm) = M(A1 A... N Am A 
AB, A... A 7B) = tj. Since T(T*(I))(A) — TR(D)(Ar A... A Am AaB A 
..-A7Bm’) is tn by the fact that T is a consequence operator, we obtain that 
M(A<— AjiA...AAmATBIiA...A7Bm) = tn, as required. If T is continuous, 
then M = lim T"+1(I) = T(limT"(I)) = T(M). E 


Intuitively, consequence operators propagate “truth” along the implication 
symbols occurring in the program. From this point of view, we would like the 
outcome of the truth value of such a propagation to be dependent only on the 
relevant clause bodies. The next definition captures this intuition. 


5.4.3 Definition Let A € Bp, and denote by By the set of all body atoms 
of clauses with head A which occur in ground(P). A consequence operator T 
is called (P-)local if for every A € Bp and any two interpretations I, K € Ip 
which agree on all atoms in B4, we have T(I)(A) = T(K) (A). 


It is our desire to study continuity in Q of local consequence operators. 
Since Q is a product topology, it is reasonable to expect that finiteness con- 
ditions will play a role in this context, as already observed in Section 3.3. 


5.4.4 Definition Let C be a clause in P, and let A € Bp be such that A 
coincides with the head of C. The clause C is said to be of finite type relative 
to A if C has only finitely many different ground instances with head A. The 
program P will be said to be of finite type relative to A if each clause in P is 
of finite type relative to A, that is, if the set of all clauses in ground(P) with 
head A is finite. Finally, P will be said to be of finite type if P is of finite type 
relative to A for every A € Bp. 


A local variable is a variable which appears in a clause body, but not in 
the corresponding head.'? It is easy to see that in the context of Herbrand 


12Local variables appear naturally in implementations, but their occurrence is awkward 
from the point of view of semantics, especially if they occur in negated body literals since 
this leads to the so-called floundering problem, see [Lloyd, 1987, Apt and Pedreschi, 1994]. 
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interpretations and if function symbols are present, then the absence of local 
variables is equivalent to a program being of finite type. 


5.4.5 Proposition Let P be a normal logic program of finite type, and let 
T be a local consequence operator for P. Then T is continuous in Q. 


Proof: Let I € Ip be an interpretation, let G2 = G(A,t;) be a subbasic 
neighbourhood of T(J) in Q, and note that Gə is the set of all K € Ip 
such that K(A) = ti. We need to find a neighbourhood G, of I such that 
T(G1) C G2. Since P is of finite type, the set By is finite. Hence, the set G1 = 
Neen, 9(B, 1(B)) is a finite intersection of open sets and is therefore open. 
Since each K € G4 agrees with I on By, we obtain T(K)(A) = T(I)(A) = t; 
for each K € G; by locality of T. Hence, T(G1) C Go. | 


Now, if P is not of finite type, but we can ensure by some other property 
of P that the, possibly infinite, intersection (]geg, G(B, 1(B)) is open, then 
the above proof will carry over to programs which are not of finite type, but 
satisfy the propert we seek. Alternatively, we would like to be able to disregard 
the infinite intersection entirely under conditions which ensure that we have 
to consider finite intersections only, as in the case of a program of finite type. 
The following definition is, therefore, quite a natural one to make. 


5.4.6 Definition Let P be a logic program, and let T be a consequence 
operator on Ip. We say that T is (P-)locally finite for A € Bp and I € Ip if 
there exists a finite subset S = S(A, I) C By such that we have T(J)(A) = 
T(1)(A) for all J € Ip which agree with I on S. We say that T is (P-)locally 
finite if it is locally finite for all A € Bp and all I € Ip. 


Obviously, any locally finite consequence operator is local. Conversely, a 
local consequence operator for a program of finite type is locally finite. This 
follows from the observation that, for a program of finite type, the sets By, 
for any A € Bp, are finite. But a much stronger result holds. 


5.4.7 Theorem A local consequence operator is locally finite for all A € Bp 
and some I € Ip if and only if it is continuous at J in Q. 


Proof: Let T be a locally finite consequence operator, let I € Ip, let A € Bp, 
and let G2 = G(A,T(I)(A)) be a subbasic neighbourhood of T(J) in Q. Since 
T is locally finite, there is a finite set S C B4 such that T(J)(A) = T(I)(A) for 
all J € ges G(B,I(B)). By finiteness of S, the set Gy = peg G(B, 1(B)) 
is an open neighbourhood of J, and by the choice of S we have T(G1) C Go, 
and this suffices for continuity of T at I. 

For the converse, assume that T is continuous at J in Q, and let A € Bp 
be chosen arbitrarily. Then G2 = G(A,T(JI)(A)) is a subbasic open neigh- 
bourhood of T(J), so that, by continuity of T, there exists a basic open neigh- 
bourhood Gy = G(B,,I(B1)) NNA G(Br, I(Br)) of I with T(G) Cc Go. In 
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other words, we have T(J)(A) = T(J)(A) for each J € Nges: G(B, 1(B)), 
where S’ = {By,..., By} is a finite set. Since T is local, the value of T(.J)(A) 
depends only on the values J(A) of atoms A € By. So if we set S = S'N Ba, 
then T(J)(A) = T(1)(A) for all J € (peg G(B, I(B)), which is to say that T 
is locally finite for A and J. Since A was chosen arbitrarily, we obtain that T 
is locally finite for J and all A € Bp. E 


The following corollary provides a sufficient condition’? for continuity in 


Q. 


5.4.8 Corollary Let P be a program, let T be a local consequence operator, 
and let 1: Bp — w be a level mapping for P with the property that [~1(n) 
is finite for any n € w and such that the following property holds: for each 
A € Bp there exists an na € w satisfying I(B) < ny for all B € By. Then T 
is continuous in Q. 


Proof: It follows easily from the given conditions that 6, is finite for all 
A € Bp, and hence T is locally finite. E 


We turn now to the study of a particular type of local consequence op- 
erator, which we call a Fitting-style operator, and its continuity. Recall from 
Section 5.2.1 that bodies of pseudo-clauses may consist of infinite “disjunc- 
tions”, but this will not pose any particular difficulties with respect to the 
logics we are going to discuss. We note that a program P is of finite type if 
and only if all bodies of all pseudo-clauses in P are finite. 

Now, if we are given (suitable) truth tables for negation, conjunction, and 
disjunction, then we are able to evaluate the truth values of bodies of pseudo- 
clauses relative to given interpretations, as was done in Section 5.2.1. 


5.4.9 Definition Let P be a normal logic program. Define the mapping Fp : 
Ip» — Ip. relative to a given (suitable) logic with n truth values by Fp(I) = 
J, where J assigns to each A € Bp the truth value I(\/ C;) of the body V Ci 
of the pseudo-clause A — V C; with head A. 


We call operators which satisfy Definition 5.4.9 Fitting-style operators or 
the Fp-operator. If we impose the mild assumption that t; — tj evaluates 
to true for every j with respect to the underlying logic, then we immediately 
obtain that every Fitting-style operator is a local consequence operator. We 
will impose this condition, namely, that tj; — tj evaluates to true for every j, 
for the remainder of this section. 

If the chosen logic is classical two-valued logic, then the corresponding 
Fitting-style operator is the immediate consequence operator Tp (for a given 
program P). Now, if Tp(1)(A) = t, then there exists a clause A <— body in 
ground(P) such that I(body) is true, and we obtain Tp(J)(A) = t whenever 


13Communicated to us by Howard A. Blair. 
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J(body) = t. The observation that bodies of clauses are finite conjunctions 
leads us to conclude the following lemma. 


5.4.10 Lemma If Tp(I)(A) = t, then Tp is locally finite for A and I. Fur- 
thermore, Tp is continuous at J if and only if it is locally finite for all A with 
Tp(I)(4) =£. 


A body V C; of a pseudo-clause with head A is false in classical logic if 
and only if all the C; are false. Since Tp is a Fitting-style operator, we obtain 
Tp(1)(A) =f if and only if all the C; are false. If we require Tp to be locally 
finite for A and J, then there must be a finite set S C By such that any J € Ip 
which agrees with J on S renders all the C; false. Conversely, if S C By is 
a finite set such that any J € Ip which agrees with J on S renders all the 
C; false, then T is locally finite for A and J. We have just established the 
following theorem. !+ 


5.4.11 Theorem Let P be a normal logic program, and let J € Ip. Then 
Tp is continuous in Q at I if and only if whenever Tp(J)(A) = f, then ei- 
ther there is no clause with head A or there exists a finite set S(I, A) = 
{Aj,..., Ak, Bi,..-, Bm} C Bp with the following properties. 


(a) [(A;) =t and I(B;) =f for all i and j. 


(b) For every clause A — body in ground(P) at least one ~A; or at least one 
B; occurs in body. 


In the case of Kleene’s strong three-valued logic, we obtain the following 
lemma. 


5.4.12 Lemma If ®p(J)(A) = t, then ®p is locally finite for A and J. Fur- 
thermore, ®p is continuous if and only if it is locally finite for all A and I 
with ®p(I)(A) € {u,f}. 


Similar considerations apply to the Fitting-style operators from Section 
5.2.1.1° We mention in passing that the non-monotonic Gelfond—Lifschitz op- 
erator is not a consequence operator in the sense discussed here, and attempts 
to characterize the continuity of it involve different methods, some of which 
will be studied in Chapter 6. 

We will finally provide a generalization of Theorem 5.1.6 for acyclic pro- 
grams. So let P be acyclic with level mapping l, and let T be a local conse- 
quence operator for P. Again, we define the mapping d : Ip x Ip — R by 
d(I, J) = 27”, where n is least such that I and J differ on some atom A with 
L(A) = n, see Definition 5.1.3 and the remarks following it. It follows from 
Propositions 4.3.7 and 5.1.4 that d is a complete ultrametric on Ip, a fact 
which is easily verified directly. 


144 direct proof without using the notion of local finiteness was given in [Seda, 1995]. 
15The operator Yp defined by means of Belnap’s four-valued logic, see [Fitting, 2002, 
Clifford and Seda, 2000], for example, is also a Fitting-style operator. 
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5.4.13 Proposition With the hypotheses stated in the previous paragraph, 
any local consequence operator T is a contraction with respect to d. 


Proof: Suppose d(I, J) = 27”. Then I and J coincide on all atoms of level 
less than n. Now let A € Bp with (A) = n. Then by acyclicity of P we 
have that all atoms in the body of the pseudo-clause with head A are of 
level less than n, and by locality of T we have that T(I)(A) = T(J)(A). So 
ALL) T(J)) 202, = 


We finally obtain the following theorem. 


5.4.14 Theorem Let P be an acyclic program, and let T be a local conse- 
quence operator for P. Then, for any I € Ip, we have that T”(I) converges 
in Q to the unique fixed point of T. 


Proof: Since d is a complete metric, we can apply Proposition 5.4.13 and 
the Banach contraction mapping theorem. This yields convergence of T” (J) 
in d to a unique fixed point M of T. By definition of d, the convergence of the 
sequence of valuations T"(I) to M is pointwise and, hence, is also convergence 
in Q. E 


Theorem 5.4.14 is remarkable since the existence of a fixed point of the 
given semantic operator can be guaranteed without any particular or further 
knowledge about the underlying multivalued logic. 


5.5 Measurability Considerations 


As we shall see in Chapter 7, continuity in Q of Fitting-style operators 
Fp, and Tp in particular, is central in relation to whether or not we can 
compute them approximately by neural networks. However, in the context 
of approximate computation by neural networks, the weaker notion of mea- 
surability has some interest, although rather less than that of continuity, see 
[Hornik et al., 1989], for example. Thus, we shall close this chapter by briefly 
discussing this topic next.!® 

In the previous section, we defined Fitting-style operators over finite truth 
sets, see Definition 5.4.9. However, unlike the case of the topology Q, finite- 
ness of the truth set 7 is not of much importance here. Therefore, we begin 
by noting that we can, in principle, work over any logic 7 in which the truth 
value in T of disjunctions of possibly infinite countable collections of elements 


16We do not formally introduce the notion of measurability and refer to [Bartle, 1966] for 
necessary background. For full details of the results we sketch here, we refer the reader to 
[Seda and Lane, 2005]. 
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of T can be evaluated. Given this much and a normal logic program P, one 
can then easily define a Fitting-style operator as an operator Fp : Ipr —> Ip T 
which satisfies F'p(I)(A) = I(V Ci) for all I € Ipz and all A € Bp. Here, 
A <— V Ci is the pseudo-clause associated with A, and Ip.z denotes the set 
of all interpretations defined on Bp taking values in 7. The question then 
arises of providing suitable conditions under which possibly infinite count- 
able collections of truth values can be evaluated. This issue is taken up in 
Section 7.6, where the notion of finitely determined disjunctions is given in 
Definition 7.6.1 and is seen to be adequate for our present purposes. In fact, if 
disjunctions are finitely determined, then disjunction is idempotent, commu- 
tative, and associative. Furthermore, the converse of this last statement holds 
if T is finite. 

For a collection M of subsets of a set X, we denote by o(M) the smallest 
o-algebra containing M, called the o-algebra generated by M. Recall that 
a function f : X — X is measurable with respect to o(M) if and only if 
f-*(A) € o(M) for each A € M. If 8 is the subbase of a topology 7 and £ is 
countable, then o(3) = a(r). 

It turns out that Fitting-style operators are not always measurable with 
respect to the o-algebra o(Q) generated by Q, at least if the underlying truth 
set is unrestricted. However, under quite mild conditions, Fitting-style oper- 
ators are always measurable, with no syntactic conditions on the program P 
whatsoever, as we see next in the following result. (Note also that we make 
no technical use here of the condition that t; — tj evaluates to true for each 
truth value t; € T.) 


5.5.1 Theorem Suppose T is a logic in which 7 is a countable set and 
disjunctions are finitely determined. Then for any normal logic program P, 
the Fitting-style operator Fp determined by P is measurable with respect to 
the o-algebra o(Q). 


As we shall see in Section 7.6, many logics of interest in logic programming 
satisfy the requirement that disjunction is finitely determined. Indeed, it is 
satisfied for Belnap’s logic FOUR, and hence Tp, ®p, and Wp are all always 
measurable for any normal logic program P. 


Chapter 6 


Stable and Perfect Model Semantics 


The stable model semantics turns out to be the one which receives the most 
attention these days. Some of the most popular implementations of non- 
monotonic reasoning systems are based on it.' In this chapter, we provide 
means to lift our results on the supported model semantics to the stable 
model semantics. This is done by the so-called fixpoint completion of pro- 
grams, which we will introduce in Section 6.1. This construction will enable 
us to draw almost effortlessly a number of corollaries on the stable model 
semantics, and we will do this in Section 6.2. Finally, in Section 6.3, we will 
close our discussion with some additional observations on stratification and 
the perfect model semantics. 


6.1 The Fixpoint Completion 


The fixpoint completion is a program transformation which is based on 
the notion of unfolding, meaning the replacement of a body atom A by the 
body of a clause which also has head A. In essence, the fixpoint completion of 
a given program is obtained by performing (recursively) a complete unfolding 
through all positive body atoms and disregarding all clauses which after this 
process still contain positive body atoms. We will describe this formally in the 
following definition. 


6.1.1 Definition A quasi-interpretation? is a set of clauses of the form 
A — ~Bı,..., Bm, where A and B; are ground atoms for all i = 1,...,m. 
Given a normal logic program P and a quasi-interpretation Q, we define 
Tp(Q) to be the quasi-interpretation consisting of the set of all clauses 
A <— body,,...,body,,7By,...,7Bm for which there exists a clause A — 
Aj,...,An,7B1,...,7Bm in ground(P) and clauses A; — body; in Q for all 
i=1,...,n. We explicitly allow the cases n = 0 or m = 0 in this definition. 


1See [Leone et al., 2006] for details of the DLV system and [Simons et al., 2002] for details 
of the smodels system, for example. 

?This notion is due to [Dung and Kanchanasut, 1989]. We stick to the old terminology, 
although quasi-interpretations should really be thought of as, and indeed are, programs 
with negative body literals only. 
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Note that the set of all quasi-interpretations is a complete partial order 
with respect to set-inclusion. 


6.1.2 Proposition Given a normal logic program P, the operator Tp is Scott 
continuous on the set of all quasi-interpretations. 


Proof: We show first that T is monotonic. So let Q C R be quasi- 
interpretations, and let A — body be in Tp (Q). If A — body results from 
the unfolding of some clause A — body, in P with some clauses B; — body, 
in Q, then B; — body, is contained in R for all ¿ by assumption, and by the 
existence of the clause A — body, in P we obtain A — body in T (R) by 
unfolding. If A — body € T p(Q) does not result from some unfolding, then it 
is already contained in P and, hence, in T(R). Thus, Th is monotonic. 
Now let Q = {Q) | \ € A} be an indexed directed family of quasi-interpre- 
tations, and let Q = |] Q = [J Q. Since the order under consideration is set- 
inclusion and Tp is monotonic, we immediately have that T;,(Q) is directed. 
By the remarks following Definition 1.1.7, it therefore remains to show that 
Tp(Q) C UT (Q). So suppose that A — body belongs to T p(Q). If A — body 
does not result from an unfolding, then it is already contained in P, hence also 
in T p(Q). Otherwise, A — body results from the unfolding of some A — body, 
in P with some B; — body, in Q. But then there is \ such that all B; — body, 
are contained in Q); hence, A — body is contained in T p(Q) C T (Q), as 
required. a 


Given a normal logic program P, we define the fixpoint completion fix(P) 
of P by fix(P) = Tp Tw. 


6.1.3 Example Consider again the example program Tweety2, see Program 
2.3.9. We obtain the following. 

I ere T 0= Ø 

Trweety2 71 = {penguin(tweety) —, bird(bob) —} 

Trweety2 {2= Trweety2 T1U {bird(tweety), flies(bob) — —penguin(bob)} 

Trweety2 T3= Trweety2 72U {flies(tweety) — apenguin(tweety) } 
fix(Tweety2) = Tweety? 13. 

The importance of the fixpoint completion lies in the fact that the stable 


models of a given program P are exactly the supported models of fix( P). We 
can prove an even stronger result.® 


3The proof of Theorem 6.1.4 is taken directly from [Wendt, 2002a], which appeared in 
compressed form as [Wendt, 2002b]. This correspondence can also be carried over to the 
Fitting/well-founded semantics. More precisely, it was shown in [Wendt, 2002b] that for any 
normal logic program P and any three-valued interpretation J, we have UW p(I) = ®gx(p) (J), 
where Wp is the operator due to [Bonnier et al., 1991] used for characterizing three-valued 
stable models, but is not treated here. A corollary of the result just mentioned is that the 
well-founded model for a given program P coincides with the Fitting model for fix(P). 
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6.1.4 Theorem For any normal logic program P and (two-valued) interpre- 
tation I, we have 
GLp(L) = Tax(p) (J). 


Proof: We show first that for every A € GLp(J) there exists a clause in fix(P) 
with head A whose body is true in J, and hence A € Tax(p) (T). We show this 
by induction on the powers of Tp 7; recall that GLp (1) = Tp/; Tw. 

For the base case Tp/; 10 = 0, there is nothing to show. 

So assume now that for all A € Tp;; În there exists a clause in fix(P) 
with head A whose body is true in J. For A € Tp/;} (n + 1), there exists a 
clause A + Aj,...,An in P/I such that Aj,...,An € Tp/; În, and hence 
by construction of P/I there is a clause A — Aj,...,An,7Bi,...,7Bm in 
ground(P) with B,,...,B, ¢ I. By our induction hypothesis, we obtain 
that for each i = 1,...,n there exists a clause A; < body, in fix(P) with 
I — body;, and hence A; € Tgx(p)(I). So by definition of Tp the clause 
A+ body),...,body,,, 7Bi,...,7Bm is contained in fix(P). From I — body; 
and By,...,Bm ¢ I, we obtain A € Tax(p) (T), as desired. This finishes the 
induction argument, and hence GLp(J) C Tax(p) (J). 

Now conversely, assume that A € Tax(p) (I). We show that A € GLp(/) 
by proving inductively on k that Tr.%.(1) C GLp(J) for all k € N. 

For the base case, we have T7yj9() = 0, so there is nothing to show. 

So assume now that Tp:4,(I) © GLp(J), and let A € Triye+1) O) \Trer C). 
Then there is a clause A — body,,..., body, ,7B1,..., 7B, in Tp f(k +1) 
whose body is true in J. Thus, B,,...,Bm ¢ I, and for each i = 1,...,n 
there is a clause A; — body, in Tp 7 k with body, true in I. So A; € 
Traw(I) © GLp(I). Furthermore, by definition of Tp, there exists a clause 

— Aj,...,An,7B1,...,7Bm in ground(P), and since B,...,Bm ¢ I, we 
obtain A — Aj,...,An E€ P/I. Since we know that A1,..., An E€ GLp(J), 
we obtain A € GLp(J), and hence Treyk+1)(T) C GLp(Z). This finishes the 
induction argument, and we obtain Tax(Pp) (T) © GLp(J). a 


The following corollary is an immediate consequence of Theorem 6.1.4. 


6.1.5 Corollary Let P be a normal logic program. Then the stable models 
of P are exactly the supported models of fix(P). 


6.2 Stable Model Semantics 


Theorem 6.1.4 enables us to carry over results on the single-step operator 
and on the supported model semantics to the Gelfond—Lifschitz operator, re- 
spectively, the stable model semantics. We will first consider continuity issues. 

The following observation is of technical importance. 
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6.2.1 Proposition Let P be a definite logic program, let A € Bp, and let 
n € N. Then A € Tp În if and only if A < is a clause in Th În. 


Proof: Let A € Tp În for some n € N. We proceed by induction on n. If 
n = 1, then there is nothing to show. So assume that n > 1. Then there is a 
clause A — body in ground(P) such that all atoms B; in body are contained 
in Tp Î (n — 1), and by the induction hypothesis there are clauses B; — in 
Tp | (n — 1). Unfolding these clauses with A — body shows that A < is also 
contained in Tp fn. 

Conversely, assume there is a clause A + in Tp fn. We proceed again by 
induction. If n = 1, there is nothing to show. So let n > 1. Then there exists 
a clause A — Aj,..., Ap in ground(P) and clauses A; — in Tp f (n — 1). By 
the induction hypothesis, we obtain A; € Tp Î (n — 1) for all i, and hence 
AETpÎn. | 


Given a program P, we know by Theorem 6.1.4 that GLp is continuous 
at some I € Ip in Q if and only if Tax(p) is continuous at J. This gives rise to 
the following theorem. 


6.2.2 Theorem Let P be a normal logic program, and let Z € Ip. Then 
GLp is continuous at I in Q if and only if whenever GLp(J)(A) = f, then 
either there is no clause with head A in ground(P) or there exists a finite set 
S(I, A) = {Ai,..., Ak} C Bp such that [(A;) = t for all ¢ and for every clause 
A <— body in ground(P) at least one ~A; or some B with GLp(I)(B) = f 
occurs in body. 


Proof: By Theorem 5.4.11 and Theorem 6.1.4, and by observing that there 
are no positive body atoms occuring in fix(P), we obtain the following. 


GL p is continuous at I if and only if whenever GLp(I)(A) =f, 
then either there exists no clause with head A in fix(P) or 
there exists a finite set S(I, A) = {Aj,..., Ak} C Bp such 
that I(A;) = t for all i and for every clause A — body in 
fix(P) at least one ~A; occurs in body. 


So let P be such that GLp is continuous at J. If there is no clause with 
head A in ground(P), then there is nothing to show. So assume that there 
is a clause with head A in ground(P). Then we already know that there 
exists a finite set S(I, A) = {Aj,...,A,} C Bp such that I(A;) = t for 
all i and for every clause A — body in fix(P) at least one ~A; occurs in 
body. Now let A — Bjy,...,Br,7Ci,...,7Cm be a clause in ground(P), 
and assume that no ~A; occurs in its body. We show that there is some 
B; in body with GLp(J)(B;) = f. Assume the contrary, that is, that 
GLp(J)(B;) = t for all i. Then for each B; we have B; € GLp(I) = Tp; Tw. 
As in the proof of Proposition 6.2.1, we conclude that there is a clause 
A = W-7Dj,...,9Dn,7C),...,7Cm in fix(P) with D; ¢ I for all j =1,...,n. 
Since the clause A — =D,,...,7.Dn,7C1,...,7Cm is contained in fix(P), we 
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know that some atom from the set S(I, A) must occur in its body. It cannot 
occur as any D; because I(D;) = f for all j. It also cannot occur as any C; 
by assumption. So we obtain a contradiction, which finishes the argument. 
Conversely, let P be such that the condition on GLp in the statement of 
the theorem holds. We will again make use of the observation made at the 
beginning of this proof. So let A € Bp with GLp(J)(A) = f. If there is no 
clause with head A in fix(P), then there is nothing to show. So assume there 
is a clause with head A in fix(P). Then there is a clause with head A in P, and 
by assumption we know that there exists a finite set S(I, A) = {Ai,..., Ak} C 
Bp such that I(A;) = t for all i and for every clause A — body in ground(P) 
at least one ~A; or some B with GLp(J)(B) = f occurs in body. Now let 
A<—-WBj),...,7B, be a clause in fix(P) = Tp tw. Then there is k € N with 
A = -—B,,...,7B, contained in Tp Ì k. Note that n = 0 is impossible since 
this would imply GLp(J)(A) = t, contradicting the assumption on A. We 
proceed by induction on k. If k = 1, then A — “By,,...,3B, is contained 
in ground(P); hence, one of the B; is contained in S(T, A), and this suffices. 
For k > 1, there is a clause A — C),...,Cm,7Di,...,7Dm in ground(P) 
and clauses C; — body; in Tp Î (k — 1) which unfold to A — 7By,...,7By. 
By assumption we either have Dj € S(I, A) for some j, in which case there 
remains nothing to show, or we have that GLp(J)(C;) =f for some i. In the 
latter case we obtain that body, is non-empty by an argument similar to that 
of the proof of Proposition 6.2.1. So by assumption there is a (negated) atom 
B in body,, and hence B is in {B1,..., Bn}. So again one of the B; is in 
S(I, A), and this observation finishes the proof. E 


We also have the following special instance of Theorem 6.2.2. 


6.2.3 Corollary Let P be a normal logic program without local variables. 
Then GLp is continuous in Q. 


Proof: We apply Theorem 6.2.2. Let I € Ip and A € Bp be such that 
GLp(I)(A) = f. Since P has no local variables, it is of finite type. Therefore, 
the set B of all negated body atoms in clauses with head A is finite. Let 
S(I,A) = {B € B | I(B) = f}; then S(J, A) is also finite. If each clause 
with head A contains some negated atom from S(I, A), there is nothing to 
prove. So assume that there is a clause A — Ay,...,A,,7B),...,7B,, in 
ground(P) with B; ¢ S(I, A) for all j, that is, suppose I(B;) = t for all j. 
Then A + Aj,...,An is a clause in P/I and A ¢ Tp,; tw. It now follows that 
there is some i with A; ¢ Tp; Tw = GLp(J), and this observation finishes 
the argument by Theorem 6.2.2. | 


Measurability is much simpler to deal with, as we see next. 


6.2.4 Theorem Let P be a normal logic program. Then GLp is measurable 
with respect to a(Q). 
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Proof: By Theorem 5.5.1 we obtain that Tax(p) is measurable with respect 
to a(Q), and by Theorem 6.1.4 we know that Thx(p) = GLp. E 


The following variant of Theorem 5.4.2 can be proven directly. 


6.2.5 Theorem Let P be a normal logic program, and let GL p be continuous 
and such that the sequence of iterates GL$ (I) converges in Q to some M € Ip. 
Then M is a stable model for P. 


Proof: By continuity we obtain M = lim GL (I) = GLp(lim GL} (I)) = 
GLp(M). a 


We can also exploit our knowledge about the relationships between the 
single-step operator and the Fitting operator. 


6.2.6 Proposition Let P be a normal logic program, and assume that M = 
®gx(p) | w is total.t Then GL5(0) converges in Q to M+, and M7 is the 
unique stable model for P. 


Proof: This follows immediately from Proposition 5.2.7 and Theorem 6.1.4. 
a 


Metric-based approaches also carry over to our present context; we restrict 
our discussion to the following corollary of Theorem 5.1.6. 


6.2.7 Theorem Let P be a locally stratified normal logic program with cor- 
responding level mapping l. Then GLp is strictly contracting with respect to 
dı. If the codomain of l is w, then GLp is a contraction with respect to dı. 
Furthermore, in both cases, GLp has a unique fixed point, and therefore P 
has a unique stable model. 


Proof: If P is locally stratified with respect to l, then fix(P) is locally hierar- 
chical with respect to l. It thus suffices to apply Theorem 5.1.6 in conjunction 
with Theorem 6.1.4. a 


6.2.8 Remark With the comments already made concerning the fact that 
the well-founded model for a given program P coincides with the Fitting 
model for fix(P), for any normal program P, we can also derive the following 
result. 


4We mentioned earlier in this chapter that ®g.(p) coincides with the operator Vp from 
[Bonnier et al., 1991] for characterizing three-valued stable models. 
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Let P be a program with total well-founded model IU-(Bp\ J), 
where I C Bp. Then GLp is strictly contracting on the spher- 
ically complete dislocated generalized ultrametric space (Ip, o), 
where we have o( J, K) = max{di(J,I),ai(1, K)} for all J, K € 
Ip, and l is defined by taking I(A) to be the minimal a such 
that Dax(P) ili (a F 1)(A) = I(A). 


Indeed, the program P has a total well-founded model in this case, and this 
implies that fix(P) has a total Fitting model. So l as just defined is, in fact, 
well-defined, and fix(P) satisfies (F) with respect to [U7(Bp \ I) and l. Now 
apply Theorem 5.1.17. 


6.3 Perfect Model Semantics 


We return to matters of stratification and the perfect model semantics. 
More precisely, we will describe an iterative method for obtaining the perfect 
model for locally stratified programs.” 


6.3.1 Definition Let P be a normal logic program, and let 1: Bp — y bea 
level mapping, where y > 1. For each n satisfying 0 < n < y, let Pin; denote 
the set of all clauses in ground(P) in which only atoms A with I(A) < n occur, 
and denote by £n the set of all atoms A of level I(A) less than n. We define 
Tin) : P(Ln) > P(Ln) by Til) = Tr, (I). The mapping Tin) is called the 


immediate consequence operator restricted at level n. 


Thus, the idea formalized by this definition is to “cut-off’ at level n. 


6.3.2 Definition Let P be a locally stratified normal logic program, and let 
l: Bp — y be a level mapping, where y > 1. We construct the transfinite 
sequence (In)ney inductively as follows. For each m € N, we put I,m) = 
TH (0) and set Tı = Urol ft mj- If n € y, where n > 1, is a successor ordinal, 


then for each m € N we put [nm] = Tinj (In-1) and set In = Ee eee pee relia 
n € yis a limit ordinal, we put In = U,,<,/m- Finally, we put Ip) = Un<ytn. 


6.3.3 Example Consider again the example program Tweety2, Pro- 
gram 2.3.9, where penguin(X) is assigned level 0, bird(X) is assigned level 
1, and flies(X) is assigned level 2, for all X € {tweety, bob}. We obtain the 


>For further details, we refer the reader to the paper [Seda and Hitzler, 1999b]. 
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following. 


I, = {penguin(tweety) } 
In = I, U {bird(bob), bird(tweety) } 
I = In U {flies(bob) } 


Titweety2] = Í. 


The main technical lemma we need is as follows. For its proof, which is by 
transfinite induction, it will be convenient to put Ij; mj = In for all m € N 
whenever n is a limit ordinal; thus, statement (b) in the lemma makes sense 
for all ordinals n. 


6.3.4 Lemma Let P be a normal logic program which is locally stratified 
with respect to the level mapping l : Bp — y, where y > 1. Then the following 
statements hold. 


(a) The sequence (In)ney is monotonic increasing in n. 


(b) For every n € y, where n > 1, the sequence (Iin, m]) is monotonic increasing 
in m. 


(c) For every n € y, where n > 1, In is a fixed point of Ti). 


(d) If I(B) < n and B ¢ In, where B € Bp, then for every m € y with n < m 
we have B ¢ Im and, hence, B ¢ Ip]. In particular, if I(B) < n and 
B € In+1,m] for some m € N, then B ¢ I, and, hence, B ¢ Ip}. 


Proof: It is immediate from the construction that the sequence (In)ney is 
monotonic increasing in n, and this establishes (a). 

The main work is in proving (b) and (c), which we treat simultaneously. To 
do this, we need to note the technical fact that, for each n € y, we can partition 
Pin+1] aS Pin} U P(n), where P(n) denotes the subset of ground(P) consisting 
of those clauses whose head has level n. Thus, Tin+1j(1) = Tin (I) U Tein) (I) 
for any I € Ip; note that if A € Tpín) (Z), then (A) =n. 

Let P(n) be the proposition, depending on the ordinal n, that (J[n,mj) 
is monotonic increasing in m and that J, is a fixed point of Tin}. Suppose 
that P(n) holds for all n < a, where a < y is some ordinal. We must show 
that P(a) holds. Indeed, P(1) holds since Py) is a definite program and the 
construction of J; is simply the classical construction of the least fixed point 
of Tjj. Therefore, we may assume that a > 2. It will be convenient to break 
up the details of the case when a is a successor ordinal into the four steps (1) 
to (4) below. 

Case i. a = k + 1 is a successor ordinal. Thus, P(k) holds. 
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Itk+1,0] = Ik 
Iik+1,m41] = Ik U Tp) (ik+1,m]) 


and the first is immediate. Putting m = 0, we have Jjg411) = Tik+1] (fk) = 
Tir] (Ip) U Tp) (Lk) = kU T p(k) (Lk) = kU TpP(k) (Iik+1,0]); using the fact that 
Iņ is a fixed point of Tix}. Now suppose that the second of these equations 
holds for some m > 0. Then 


Tik+1,(m+1)+1] = Tika Ze +1,m41)) 
= Tye k+1,m+1]) U Tp) Mik+1,m+1]) 
= Tig (Ik U Tp) (ik+1,m])) U Tp) (ik+1,m+1]), 


and it suffices to show that Tix] (Ix U Tp(%)(Ljk+1,m])) = Ik. So suppose that 
A € Tigr U Tpk) (ik+1,m])). Thus, there is a clause in Pi, of the form 
A+ Aj,..., Ax, , 7B, rere TBE where Aj,... Ar, E€ Ip U T p(k) (Le+1,m]) 
and B1,..., Bu Z Ik U Tpk) (Iik+1,m])- But then level considerations and the 
hypothesis concerning P imply that Aj,...,A,, € Ik and Bi,...,Bi, ¢ Ik. 
Therefore, A € Tik] (Ik) = Ik, and the inclusion Tig) (Ik U Tp(e) (Lje+1,mj)) E Ik 
holds. The reverse inclusion is demonstrated in like fashion, showing that the 
second of the recursion equations holds with m replaced by m + 1 and, hence, 
by induction on m, that it holds for all m. 

(2) We have the inclusions Tp) (Ie) = TP(k) (Ik U Tp (Ik)) a Tp) Tk U 
Tpx) (Ik UT p(x) Ue))) -- . These inclusions are established by methods similar 
to those we have just employed, and we omit the details. 

It is now clear from this fact and the recursion equations in Step (1) 
that (Iik+1,m]) OF “jajmj); is monotonic increasing in m. Since monotonic 
increasing sequences converge to their union in Q, and Iik+1,m] is an iterate 
of Iz, it now follows by Theorem 5.4.2 that [p41 is a model for Py,+1)- 

(3) If B € Bp and I(B) < k, then B € Ip4, if and only if B € Ip. 

Indeed, if B € Ip, then it is clear from the recursion equations of Step (1) 
that B € I,4 1. On the other hand, if B ¢ Ipk, then it is equally clear from 
the recursion equations and level considerations that, for every m € N, B ¢ 
Tik+1,m) and, hence, that B ¢ Ip41, as required. 

(4) Ip41 is a supported model for Pik+1]- 

To see that this claim holds, suppose that A € Ip41 = UP —olik+1,m]; Then 
there is mo € N such that A € Ipsi mii = Teri (le) for all m > mo. 
Thus, A € Ti+ (Tipe 1) Ue) = Tik+1](lik+1,mo])- Hence, there is a clause 
Ace Ai, ee Ax, Bı, sey =B, in Pik+1] such that each A; € Tik+1,mo] and 
no Bj € Iik+1,mo]: But 1(B;) < k for each j since P is locally stratified. Since 


6As shown here, it results from these equations that the process of constructing 
Iik+1,m+1] in terms of Iik+1,m] is inflationary, where, formally, an operator G defined on 
a collection of sets is said to be inflationary if X C G(X) for each set X in the given 
collection; see also the corresponding recursion equations in Corollary 6.3.5. 
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B; Z Ik-41,mo], We now see from the recursion equations that B; ¢ Ip. From 
the result in Step (3) we now deduce that, for each j, Bj ¢ Ip41. Since it is 
obvious that each A; belongs to Iķ+1, we obtain that A € Ti, 41)(Ue41). Thus, 
Ik+1 C Tye+1)e+1), and therefore [41 is a supported model for P41), or a 
fixed point of Tik+1]; as required. 

Thus, P(a) holds when a is a successor ordinal. 

Case ii. a is a limit ordinal. 

In this case, it is trivial that (J[g,mj) is monotonic increasing in m. 
Thus, we have only to show that J, is a fixed point of Tja], that is, a sup- 
ported model for Paj, and we show first that Ia is a model for Paj. Let 
A € Tiaa). Then there is a clause A — Aj,...,Ax,,7B1,...,7B), in 
Proj Such that A1,..., Akı € Ia and By,...,B), ¢ Ia. Indeed, by the defi- 
nition of Paj and the hypothesis concerning P, there is no < a such that the 
clause A <— Aj,...,Ax,,7B1,..., Bj, belongs to Pno]: Since the sequence 
(In)ney is monotone increasing and I, = Up<aln, there is ny < œ such 
that Ay,..., Akı € In, and Bi,...,Bi, ¢ In,. Choosing ng = max{no, nı}, 
we have A e Aj jos Ak, B1, ..., Bu € Pnp) and also A1,..., Ak, € In 
and B1,..., Bu ¢ In,. Therefore, on using the induction hypothesis, we have 
A € Tinana) = In, © Ia. Hence, Taja) C Ia, as required. 

To see that J, is supported, let A € Ie. By monotonicity of (In)ney 
again and the identity In = Un<a{n, there is a successor ordinal no > 1 
such that A € J, for all n such that no < n < a. In particular, we 
have A € Ino = U%-olino,m]: Therefore, there is mı € N such that 
A € Ing mit] = Tino} (Ting Zno—1))- Consequently, there is a clause A — 
Ai, Kes , Ak B1, e... =B, in Ping] such that Al, tae AR, €E Tino] vino) = 
inom] C Tag C Ia and Bj,... » Br, g Linom: But I(B;) < no — 1 for each 
j, and so no B; belongs to I,,-1 by Step (3) of the previous case. Therefore, 
by this step, no B; belongs to Ino, and by iterating this we see that, for ev- 
ery m € N, no B; belongs to Ino+m. Therefore, no B; belongs to Ia. Hence, 
we have A € Tina) E Traj(Za) or, in other words, that Ia C Tiaa), as 
required. 

It now follows that P(n) holds for all ordinals n, and this completes the 
proof of (b) and (c). In particular, we see that the recursion equations obtained 
in Step (1) hold for all ordinals k, and we record this fact in the corollary below. 
Indeed, all that is needed to establish these equations is the fact that each Ik 
is a fixed point of Tj,; and to note that the proof just given shows also that 
Ip) is a fixed point of Tp. In turn, (d) of the lemma now follows from this 
observation by iterating Step (3). 


The proof of the lemma is therefore complete. E 


It can be seen here, and it will be seen again later, that the importance of 
(d) is the control it gives over negation in the manner illustrated in the proof 
just given that Iķ+ı is a supported model for Piķ+1]. It is also worth noting 
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that the construction produces a monotonic increasing sequence by means of 
a non-monotonic operator.” 


6.3.5 Corollary Suppose the hypothesis of Lemma 6.3.4 holds. Then the 
following statements hold. 


(a) For all ordinals n and all m € N, we have the recursion equations 


Tin+1,0) = In, and 


Iin+1,m+1] = h U TP(n) (Iin+1,m])- 


(b) If P is, in fact, locally hierarchical, then for every ordinal n > 1 we have 
In+1,m] = In U Tp(n) (In) for all m € N, where P(n) is defined as in the 
proof of Lemma 6.3.4, and therefore the iterates stabilize after one step. 


Proof: That (a) holds has already been noted in the proof of Lemma 6.3.4. 
For (b), it suffices to prove that Tp(n)(In) = Tp(n) Un U Tp(n)(Un)). So 
suppose therefore that A € Tp(n)(In UTp(n)(In)). Then there is a clause A — 
Aj,... , Ak 7B, ..., 7B, in P(n) such that Aj,... Ak, E hU Tp(n) In) 
and By,...,Br, Z In UTpin)(Un). From these statements and by level con- 
siderations, we have A,,...,Az, E I, and By,...,B,, Z In. Therefore, 
A € Tpm)(In), so that Tpin) (In U Tpny(In)) © Tpin) (In). The reverse in- 
clusion is established similarly to complete the proof. a 


Statement (b) of this corollary makes the calculation of iterates very easy 
to perform in the case of locally hierarchical programs. 


6.3.6 Theorem Suppose that P is a normal logic program which is locally 
stratified with respect to the level mapping l : Bp — y. Then Ip} is a minimal 
supported model for P. 


Proof: That Ip] is a supported model for P follows from the proof of 
Lemma 6.3.4, and so it remains to show that Ip] is minimal. To do this, 
we establish by transfinite induction the following proposition: “if J C Ip 
and Tp(J) C J, then In C J for all n € y, where n > 1”, and this clearly 
suffices. Indeed, Tryj(J) C Tp(J) C J, and therefore J is a model for Py). 
But, as already noted in proving Lemma 6.3.4, Iı is the least model for Py 
by construction, since Pi) is definite. Therefore, Iı C J, and the proposition 
holds with n = 1. 

Now assume that the proposition holds for all ordinals n < a@ for some 
ordinal a € y, where a > 1; we show that it holds with n = a. 

Case i. a = k +1 is a successor ordinal, where k > 0. We have I, C 
J. We show by induction on m that Iik+1,m] G J for all m. Indeed, with 
m = 0, we have Ii, 41,0) = Ik C J. Suppose, therefore, that Itk+1,mo] E J 


Lemma 6.3.4 plays a role here similar to that played by [Apt et al., 1988, Lemma 10]. 
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for some mp > 0. Let A € Iik+1,mo+1] = Teta (Tike). Then there is 
a clause A + Aj,...,Ax,,7B1,..., 7B), in P41) such that Ay,..., Ax, € 
Tear) = lik+1,mo] and Bi,..., Bi, ¢ Ip+1,mo]- But 1(B;) < k for each j. 
Applying Lemma 6.3.4 (d) we see that no B; belongs to Typ], and consequently 
no B; belongs to J because J C jp). Since Iik+1,mo] G J by assumption, we 
have Aj,...,Az, € J. Therefore, A € Tip41)(J) C Tp(J) C J, and from this 
we obtain that Iik+1,mo+1] G J, as required to complete the proof in this case. 

Case ii. œ is a limit ordinal. In this case, Ig = | ene and I, C J for all 
n < a by hypothesis. Therefore, Ia C J, as required. 

Thus, the result follows by transfinite induction. E 


We can strengthen Theorem 6.3.6. 


6.3.7 Theorem Suppose that P is a normal logic program which is locally 
stratified with respect to a level mapping l: Bp — y, where y is a countable 
ordinal. Then Ip} is a perfect model for P. 


Proof: Suppose that there is a model N for P which is preferable to Typ) (and 
therefore distinct from p]); we will derive a contradiction. 

First note that N \ [p] must be non-empty; otherwise, we have N C Ip). 
But this inclusion forces equality of N and I;p) since Jp] is a minimal model 
for P, and therefore N and Iip] are not distinct. This means that there is a 
ground atom A in N \ Ijp), which can be chosen so that /(A) has minimum 
value; let B be a ground atom in Ip] \ N corresponding to A in accordance 
with Definition 2.5.2 and satisfying (A) > I(B). 

Next we note that T(N) C Tp(N) C N, since N is a model for P. Hence, 
N is a model for Paj, which implies that Jı C N since J; is the least model for 
the definite program Pi). Therefore, B can be chosen so that B € In, \N, with 
minimal no > 1. Now no cannot be a limit ordinal; otherwise, we would have 
Ing = Um <no{m; from which we would conclude that B € Im \ N for some 
m < no contrary to the choice of no. Thus, no must be a successor ordinal, 
and therefore B can be chosen so that B € Iino mo] \ N, where mo is such that 
Iinom] \ N = Ø whenever mı < mo, ; indeed, since I; C N, we must have 
no > 1 and mo > 1 also. Consequently, B € Tino] (Zing,mo—1)) W, showing that 
there is a clause B — Cy,...,Cz,,7D1,..., Du in Ping) with the property 
that each C; € Ino,mo-1] and no Dj € Iing,mo—1)- Since 1(D;) < no — 1 for 
each j, we see that none of the D; belong to Ijp) by Lemma 6.3.4 (d). But 
all the C;, if there are any, must belong to N by the choice of the numbers 
no and mo. Moreover, there must be at least one D; and indeed at least one 
belonging to N. For if there were no Dj or we had each D; Z N, then we 
would have B € Tp, (N) C Tp(N) C N, using again the fact that N is a 
model for P. But this leads to the conclusion that B € N, which is contrary 
to B € Ip, \ N. Thus, there is a D = D; € N \ Ip), for some j, satisfying 
I(D) < I(B) < I(A). Since A was chosen in N \ Ip] to have smallest level, we 
have a contradiction. 
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This contradiction shows that jp) must be a perfect model for P, as re- 
quired. | 


6.3.8 Program Since locally stratified programs are a generalization of lo- 
cally hierarchical programs, it is clear that each locally hierarchical program 
has a unique perfect model. This does not hold, however, for ®*-accessible 
programs. Indeed, the program 


p= nq 
V E 


is ®*-accessible (even acceptable) with respect to the unique supported model 
M = {p}. However, I = {q} is also a model for this program, and while I 
is preferable to M, M, in turn, is also preferable to I, so P does not have a 
perfect model. 


We finally return to the special case of stratified programs. We temporarily 
introduce the powers of an operator T mapping a complete lattice to itself:8 


TTO) =I 
TH(n+1)(L) = T(PIn(L)) UT Inl) 


TTw(L) = Ue rind. 


Of course, Tt n(JZ) is not equal to T” (I) unless T happens to be monotonic 
and I C T(I). Indeed, the sequence (T f n(I))n is always monotonic increasing 
whether or not T is monotonic. However, this concept can be used to construct 
an associated model Mp for any stratified program P as follows. We put Mo = 
0, Mı = Tp, Tw(Mo), sae , Mm = Tp,, Tw(Min-1). Finally, let Mp = Mm. 

We will show that Mp is the perfect model for P, for stratified P. To do 
this, it will be convenient to introduce the concept T f n(I) for a mapping 
T : Ip — Ip and I € Ip. In fact, T7}n(JZ) is defined inductively as follows: 


THOM) =I 
Thin +D) =T(Tttn(1)) UL 
Tftw(L) = Us rte. 


6.3.9 Theorem Let P bea stratified normal logic program. Then Tp] = Mp. 


Proof: As usual, we take the stratification to be P = P, U... U Pm, and we 
will show by induction that I, = Mk for k = 1,...,m and that I, = Mm for 
k > m. From this we clearly have Typ] = Mm = Mp, as required. 


8This and the following construction of Mp was introduced in [Apt et al., 1988]. 


182 Mathematical Aspects of Logic Programming Semantics 


With the definition of the level mapping we are currently using and with 
the conventions we have made regarding the stratification, we note first that 
the equalities Pią] = ground(P,UP2U...UP;,) and P(k—1) = ground(P,) both 
hold for k = 1,...,m, where P(k) is as defined in the proof of Lemma 6.3.4. 

Now Pa] = ground(P,) is definite, even if empty, and so it is immediate 
that Tp, t i(Mo) = Tp, T? i(Mo) for all ¢ and that I, = Mı. So suppose 
next that Tp, ,, ti(Mk) = Tp,,, 1 é( Mz) for all ¢ and that Ip41 = Mz41 for 
some k > 0. Then Tp, ,, ft O(Mk+1) = Mk+1 = Tp... T O(Mk+1) and also 
Iik+2,0] = Ik+1 = Mk+1 = Tp,,. T O(Mk+1). So now suppose that Tp,,, ft 
m(Mp+1) = Tp,,5 Tm(Mp4+1) and that Iik+2,m] = TP, T m(Mk+1) for some 
m > 0. Then Tees tt (m + 1)(Mx+1) = Epp Pay tt m(Mk+1)) U Mk+1 
and T Pria T (m T 1)(Mk+1) = Tp, a (Tp; 2 T m(Mk+1)) U Tes 1 m(Mk+1), 
and it is clear that Tp,,, ft (mMm + 1)(Mk+1) © Tp... T (M + 1)(Mk+1). For 
the reverse inclusion, we note that under our present hypotheses we have 
TPp42 T (m + 1)(Mx +1) S. TPp42 (TPs fy m(Mp41)) U TP, 12 f}m( Mz +1), and so 
it suffices to show that Tp, ,,. f}m(Mr4i) © TP,p2 (TP, 42 TM(Mk+1))UMk+1 or, 
in other words, that I[,42.mj © Tp(k+1)Uik+2,m])UIk+1. Since this latter set is 
equal to Ttk+2,m+1] by the recursion equations of Corollary 6.3.5, the inclusion 
we want follows from the monotonicity of the sets J[,42,m] relative to m. We 
conclude, therefore, that Tp, ,, (m+ 1)(Mk+1) = TP. T (M + 1)(Mk+1). 

Finally, Ip42m41) = Ik+1 U Trecesi)(et2,mj) = Mk+1 U Te, 42(TP. 42 T 
m(Mz+1)) = Mk+1 U LP. 42 (TP tt m(Mk+1)) = ih aes tt (m + 1)(Mk+1) = 
Tp... | (m+ 1)(Mk+1), by the conclusions of the previous paragraph. There- 
fore, I[k42m+1 = TPp42 | (M + 1)(Mk+1). From this we obtain, by induction, 
the equality Tik+2,m] = TP} T? M(Mk+1) for all m and with it the equality 
Ik+2 = Mk+2, as required. | 


The details of the induction proof just given also establish the following 
proposition. 


6.3.10 Proposition Let P = P, U... U Pm be a stratified normal logic 
program. Then we have that Tp,,, Nil(Mk) = Tp,,, T il(Mkp) for all i and 
k=0,...,m—1. 


Finally, we show that locally stratified programs have a unique perfect 
model, which is also their total weakly perfect model. 


6.3.11 Theorem Let P be locally stratified. Then P has a total weakly per- 
fect model which is a perfect model for P. Furthermore, this model is in- 
dependent of the choice of level mapping with respect to which P is locally 
stratified.’ 


Proof: We will employ Theorem 2.5.9 to establish the claim. Let P be lo- 
cally stratified with respect to some level mapping l’. Consider the equations 


9In fact, it is known that every locally stratified program has a unique perfect model, 
see [Przymusinski, 1988]. 
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established in Corollary 6.3.5 (a) and define the level mapping l mapping to 
pairs of ordinals as follows. For A € Ip) let I(A) = (I/(A),m), where m is 
least such that A € Iy()+1,m41]. For A ¢ Ip let 1(A) = (V (A) + 1,0). The 
recursion equations from Corollary 6.3.5 (a) together with the fact that P is 
locally stratified thus allow us to conclude that (WSi), (WSiib), or (WSiic) 
is always satisfied with respect to I[p) and l. Since Ipp} is total, we obtain by 
Theorem 2.5.9 that Tip; U (Bp \ I[p)) is the (total) weakly perfect model for 
P. Since every program has only one weakly perfect model, and we have just 
seen that the weakly perfect model for P coincides with Ip), we conclude that 
the model [jp] as constructed by Theorem 6.3.7 is independent of the choice 
of level mapping with respect to which P is locally stratified. | 


6.3.12 Example Consider Tweety2 from Example 2.5.3 again. It is (locally) 
stratified with respect to the level mapping given in Example 6.3.3. We calcu- 
late the perfect model for Tweety2 by employing powers of the operator Tp 
as discussed just prior to the statement of Theorem 6.3.9. Indeed, with the 
notation used there, we obtain 


Mı = {penguin(tweety)}, 

Mə = {bird(bob), bird(tweety), penguin(tweety)}, 
M3 = Myweety2, and 

M, = M3. 


As discussed in Example 2.5.3, the latter model is the perfect model for 
Tweety2. 


Chapter 7 


Logic Programming and Artificial 
Neural Networks 


Sebastian Bader,! Pascal Hitzler,? and Anthony Seda? 


7.1 Introduction 


One of the ultimate goals of artificial intelligence is the creation of agents 
with human-like intelligence, and many, varied approaches have been made 
in attempts to realize this goal. Of course, an agent endowed with human- 
like intelligence should be able to represent and reason with well-structured 
data and processes, such as those encountered in logic or in mathematics and 
related subjects, just as human beings can. On the other hand, that same 
agent should also be able to represent and reason with uncertain, noisy, and 
incomplete data, again, just as human beings can, at least to a certain extent. 
Furthermore, the agent should be able to learn by example and refine the 
reasoning process as a result. 


These two aspects of the general process of reasoning and intelligence just 
considered are complementary and yet are integrated in human intelligence. 
Thus, their integration within a single artificial computing system is an im- 
portant objective in the search for true artificial intelligence.* Logic-based 
symbolic systems are good implementations of the first, the formal, style of 
reasoning, whereas neural networks or connectionist systems are good imple- 
mentations of the second, less formal, style. They are therefore good candi- 
dates, and indeed are among the most prominent such candidates, for attempt- 
ing this integration, with each representing one of the two aspects. Certainly, 
there has been a considerable amount of interest in recent years in exactly this 


1MMIS, Department of Computer Science, University of Rostock, Germany. 

?Kno.e.sis Center for Knowledge-Enabled Computing, Wright State University, Dayton, 
Ohio, USA. 

3Department of Mathematics, University College Cork, Cork, Ireland. 

4See [Hitzler and Kiihnberger, 2009] for a more detailed discussion of this point. 
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integration, known as neural-symbolic integration, with a view to combining 
the best of both styles of reasoning within a single system. 

It will be worth contrasting a little further these two, very different, com- 
puting paradigms in order to appreciate better the issues involved in their 
integration. First, symbolic systems are usually based on a logic of one type 
or another. They possess a declarative semantics, and knowledge can be mod- 
elled in them in a human-like fashion. Thus, their use makes it easy to process 
knowledge and also to handle structured objects. Unfortunately, such sys- 
tems are hard to refine from real world data, which usually is noisy, and 
they are hard to design if no expert knowledge is available. They are essen- 
tially discrete models of computation and have been successfully used in many 
applications. On the other hand, artificial neural networks are a powerful ap- 
proach to machine learning, inspired by biology and neuroscience. They are 
trainable from raw data, even if the data is noisy and inconsistent, and thus 
are capable of adapting to new situations. They are, furthermore, robust in 
the sense that they degrade gracefully: even if parts of the system fail, the 
system still works. Unfortunately, they do not possess a declarative semantics 
and have difficulties in handling structured data. Available (symbolic) back- 
ground knowledge, which exists in many application domains, is also difficult 
to use in such systems. Being modelled on natural phenomena, connectionist 
systems are basically continuous models of computation, and they also have 
been used successfully in many applications. 

Figure 7.1 shows the Neural-Symbolic Cycle which depicts, in general 
terms, our approach to the process of integration followed here. Starting from 
a symbolic system, which is both readable and writable by humans, we cre- 
ate a neural or connectionist system into which the symbolic knowledge is 
embedded. The neural system can then be trained using powerful connection- 
ist training methods, which allows modification of the rules by generalization 
from raw data. If this learned or refined knowledge is later extracted from the 
neural system, we obtain a readable version of the acquired knowledge.® In 
fact, it is our intention to show in this chapter how to embed knowledge about 
semantic operators into connectionist systems. More specifically, we show how 
semantic operators of propositional logic programs P may be computed ex- 
actly by neural systems and how these same operators may be approximated 
in the case of first-order programs. One consequence of this is that a neural 
system acquires a sort of semantics. Another consequence is that this chapter 
may be viewed as providing a model of computation for the concepts of the 
previous chapters, and it deals to a certain extent with the implementation 
aspects of this model. This chapter therefore is a natural continuation of the 
earlier ones and gives an example of the use and application of certain of the 
methods we have developed. Indeed, the notion of approximation just men- 


5See [Bader and Hitzler, 2005, Hammer and Hitzler, 2007] for overviews of the area. 

6We do not deal with knowledge extraction here, but instead refer the reader to the 
papers [Jacobsson, 2005, Bader and Hitzler, 2005, Lehmann et al., 2010] for pointers to the 
literature. 
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FIGURE 7.1: The neural-symbolic cycle. 


tioned occurs in the context of a theorem of Funahashi, see Theorem 7.2.2, 
and employs the methods of Chapters 3 and 4 in that it casts sets of inter- 
pretations into compact metric spaces. This fact permits familiar techniques 
from analysis to be employed, and their occurrence is to be expected given the 
continuous nature of neural systems, as already noted. Such methods using 
approximation are, in fact, forced on us if we wish to employ conventional 
neural networks having only finitely many neurons because, for first-order 
programs P, both Bp and ground(P) are infinite sets. 

Thus, the main objective of this chapter is to give a detailed account of the 
foundations of neural-symbolic integration, and the main contents of the chap- 
ter are as follows. First, in Section 7.2, we introduce neural networks and the 
basic definitions and notation we need throughout, including the statement 
of Funahashi’s theorem in the form in which we use it. Next, in Section 7.3, 
we discuss in some detail the so-called core method as a general and well- 
known approach to neural-symbolic integration. Indeed, it is the method we 
adopt here, and it is already summarized in the previous paragraph. In Sec- 
tion 7.4, we commence the study of the main topic of the chapter, namely, 
the process of embedding semantic operators of logic programs into neural 
networks. Thus, in Section 7.4, we start with a basic result, Theorem 7.4.1, 
applying to propositional logic programs P and due originally to Hélldobler 
and Kalinke [Hélldobler and Kalinke, 1994]. This result provides a procedure 
which, when given a normal propositional logic program P, shows how to 
construct a neural network which computes the Tp-operator for P. The next 
section, Section 7.5, is the heart of the chapter and takes up the issue of 
the approximate computation of the Tp-operator for first-order normal logic 
programs P. Starting with the propositional approximation of Tp based on 
the previous section, we go on to study the approximate computation of Tp 
by sigmoidal networks, radial-basis-function networks, and vector-based net- 
works, in turn, before closing the section with a discussion of the approximate 
computation of the least fixed point of the Tp-operator for definite normal 
logic programs P. It should be noted that, thus far, we have concentrated on 
the Tp-operator, but we take up the study of the computation and the ap- 
proximate computation of other semantic operators, and their fixed points, in 
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FIGURE 7.2: Unit Nz in a connectionist network. 


Sections 7.6 and 7.7. In particular, in Section 7.6, we sketch the construction 
of neural networks which extend Theorem 7.4.1 to compute the Fitting-style 
operator Fp for propositional normal logic programs P. Then, in Section 7.7, 
we consider approximate computation for the operators Fp and GLI p, among 
others, for first-order normal logic programs P. 

At certain places in this chapter, the material we present is just sketched, 
and detail is provided only to the extent to which it serves to outline the 
application area under discussion. This is simply because the inclusion of full 
detail at the places in question would lead us far astray from the main topic of 
the book. We do give ample references to the literature, however, to facilitate 
the reader who is interested in studying the relevant matters further. 


7.2 Basics of Artificial Neural Networks 


We begin by briefly summarizing what we need relating to artificial neural 
networks or just neural networks for short.” 


7.2.1 Definition A neural network or connectionist network® is simply a 
weighted directed graph, or weighted digraph, endowed with extra structure, 
as follows. A typical unit (or node) Nx in this digraph is shown in Figure 7.2. 
We denote by Zk = {1,...,nx}, say, the finite set of indices j for which there 
is a digraph connection from N; to Ng, and we let wz; E€ R denote the weight 
of the digraph connection from a unit N; to a unit Nz, if there is such a 
connection, noting that wz; may be 0. Then the unit Nz is characterized, 
at time t, by the following data: its input vector (ixi(t),..-,%nn,(t)), where 
ikj(t) = wed; (t) is the input received by Ng from N; at time t; its thresh- 
old 0, € R; its potential p,(t); and its value v;,(t). The units are updated 
synchronously; time becomes t + At; at each update the potential p(t) is 
calculated by means of an activation function; and the output value for Nx, 


“Our terminology and notation are fairly standard, and the reader is referred to the pa- 
pers [Hitzler et al., 2004, Fu, 1994, Hertz et al., 1991] for further details concerning neural 
networks; in particular, we follow [Hitzler et al., 2004] closely here. 

8 Also called a connectionist system. 
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up(t + At), is calculated by means of an output function whose argument is 
px(t). In fact, the activation function we will use most often in our work is the 
weighted sum of the inputs minus the threshold. In other words, in most of 
our discussions p(t) = es wajrj(t)) — 0, € R. We say that a unit NM; 
becomes active at time t if p,(t) > 0. On the other hand, we consider a number 
of different types of units distinguished mainly by their output function, as 
follows. A unit is said to be a binary threshold unit if its output function is a 
threshold function or Heaviside function H, so that 


i > 
(t+ = Hat 9 ania 
A unit is said to be a linear unit if its output function is the identity as a 
function of p(t) and its threshold @ is 0. A unit is said to be a sigmoidal unit 
or a squashing unit if its output function ¢ is non-decreasing and is such that 
lims (x) = 1 and limz_,_ (x) = 0. Such functions are called squashing 
functions. a 


We will only consider connectionist networks where the units can be orga- 
nized in layers, although a variant of this will be encountered in Section 7.6. 
A layer is a vector of units. An n-layer feedforward network F consists of the 
input layer, n — 2 hidden layers, and the output layer, where n > 2. Each unit 
occurring in the i-th layer is connected to each unit occurring in the (i+1)-st 
layer, 1 < i < n. Let r and s be the number of units occurring in the input 
and output layers, respectively. A connectionist network F is called a mul- 
tilayer feedforward network if it is an n-layer feedforward network for some 
n. A multilayer feedforward network F computes a function fr : R” — RS, 
called the input-output mapping of F or the network function of F, as fol- 
lows. The input vector (the argument of fz) is presented to the input layer 
at time tọ and propagated through the hidden layers to the output layer. At 
each time point, all units update their potential and value, as noted above. 
At time to + (n — 1)At, the output vector (the image under f of the input 
vector) is read off the output layer. 

For a 3-layer feedforward network with r linear units in the input layer, 
squashing units in the hidden layer, and a single linear unit in the output 
layer, the input-output function of the network as described in the previous 
paragraph can thus be obtained as a mapping f : R" — R with 


f(z, oa Or) e X cid (Fusa = s) ; 
j i 


where c; is the weight associated with the connection from the j-th unit of the 
hidden layer to the single unit in the output layer, ¢ is the squashing output 
function of the units in the hidden layer, wj; is the weight associated with the 
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connection from the i-th unit of the input layer to the j-th unit of the hidden 
layer, and 6; is the threshold of the j-th unit of the hidden layer. 

It is our aim to establish results in the following sections on the represen- 
tation and approximation of various semantic operators, the Tp-operator in 
particular, by input-output functions of 3-layer feedforward networks. Some 
of our results rest on the following theorem, which is due to Funahashi, see 
[Funahashi, 1989]. 


7.2.2 Theorem (Funahashi) Suppose that ¢ : R — R is a non-constant, 
bounded, monotone increasing and continuous function. Let K C R” be com- 
pact, let f : K — R be a continuous function, and let e > 0. Then there exists a 
3-layer feedforward network F with squashing function ¢ whose input-output 
mapping fr: K — R satisfies maxzex d( f(x), fr(x)) < £, where d is a metric 
which induces the natural topology? on R. 


In other words, each continuous function f : K — R can be uniformly 
approximated by input-output functions of 3-layer (feedforward) networks. 
Furthermore, on a point of terminology, suppose given € > 0. We will write 
Y approzimates X up to € if d(Y,X) < ©, where d is some appropriate 
metric for the objects X,Y in question.!° There are two cases here where 
the definition just given will be applied, as follows. In the first case, X is a 
semantic operator and Y is an operator which we are using to approximate X; 
d is either the uniform metric used in Theorem 7.2.2 or the metric À discussed 
in Section 7.5.2. In the other case, X is a fixed point of a semantic operator and 
Y is an interpretation which we are using to approximate X; d is the metric 
dı determined by a level map (taking values in w) as in Definition 5.1.3, see 
again Section 7.5.2 and also Section 7.5.6. We will paraphrase the import of 
Theorem 7.2.2, noting that it holds for all € > 0, by writing that approximating 
networks exist for f. Furthermore, for our purposes later, it will suffice to 
assume that K is a compact subset of the set of real numbers, so that n can 
be taken to be equal to 1 in the statement of the theorem. 

An n-layer recurrent network F consists of an n-layer feedforward network 
such that the number of units in the input layer is equal to the number of units 
in the output layer. Furthermore, each unit in the output layer is connected 
with weight 1 to the unit in the corresponding position in the input layer. 
Figure 7.3 shows a 3-layer recurrent network. The subnetwork consisting of 
the three layers and the connections between the input and the hidden layer 
as well as between the hidden and the output layer is a 3-layer feedforward 
network called the kernel of F. 

Notice that any neural network in which the number of units in the input 
layer is equal to the number of units in the output layer can be made recur- 
rent just by adding the necessary obvious connections with weight 1. Notice 


°For example, d(x, y) = |æ — yl. 
10The fact that d is symmetric will not render this definition ambiguous, because in 
practice it will be clear which object is which. 
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FIGURE 7.3: Sketch of a 3-layer recurrent network containing, from left to 
right, 3 input, 4 hidden, and 3 output units and showing also the recurrent 
connections from output layer to input layer. 


also that a recurrent network can perform iterated computations because the 
output values can be returned to the input layer via the connections just de- 
scribed; it can thus perform computation of the iterates T*(I), k € N, for 
example, where J is an interpretation and T is a semantic operator. 


7.3 The Core Method as a General Approach to 
Integration 


In this section, we outline the idea underlying the approach presented be- 
low. Suppose given a normal logic program P and any one of the semantic 
operators Tp : Tp — Tp we have thus far associated with P, using Tp and 
Tp as generic symbols for a semantic operator and its underlying set of in- 
terpretations. For simplicity, we assume the interpretations in question are 
Herbrand interpretations taking values in a truth set 7, although the con- 
clusions we make here are valid over any preinterpretation J whose domain 
D is countable. Can one find, or at least show the existence of, a multilayer 
feedforward network Fp which computes Tp in some sense? Furthermore, can 
this network Fp, or some other appropriate network, compute the least fixed 
point of Tp assuming the least fixed point of Tp exists? 

A few general remarks are in order at this point. To begin with, multi- 
layer feedforward networks, even 3-layer feedforward networks, are known to 
be extremely powerful computing devices and indeed are known to be univer- 
sal approximators in the sense made precise in the statement of Funahashi’s 
theorem, Theorem 7.2.2, earlier.'! Therefore, one might expect them to have 
the ability to carry out the required computations, and this is so. Indeed, sup- 
pose that P is a first-order program and endow Zp with the Cantor topology, 


11See [Funahashi, 1989, Hornik et al., 1989] for full details. 
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assuming that the set 7 of truth values is finite. Then we obtain a compact 
Hausdorff space homeomorphic to the Cantor subset of the unit interval in the 
real line as shown in Theorem 3.3.4. Thus, whenever Tp is continuous in the 
Cantor topology on Tp (see Theorem 7.5.3), we can apply Theorem 7.2.2, tak- 
ing f = frp, taking K = Tp, and given a value of € > 0, to assert the existence 
of a 3-layer feedforward network satisfying the conclusion of Theorem 7.2.2. 
Furthermore, by making such a network recurrent, it can also compute iter- 
ates of Jp provided that conditions prevail under which the error estimate 
is uniformly well-behaved relative to € under iteration. Again, under suitable 
conditions and with a suitable choice of initial input Jo € Z (perhaps the bot- 
tom element of T), the iterates f$, (Jo) will converge to a fixed point (perhaps 
the least) of Tp, and these observations will be examined in Sections 7.5.2 and 
7.5.6, see also Corollary 7.4.3. Finally, as one might expect, if P is actually a 
propositional program, then the need for approximation disappears, and in- 
deed a 3-layer network can be constructed which actually computes Jp and, 
again under suitable conditions, computes fixed points of Tp. In fact, in the 
case of propositional programs, networks of binary threshold units suffice for 
these purposes, as we shall see. This general method is nowadays known as 
the core method, and a number of instances of it are presented in the following 
sections. 

It is important to note that the proof of Theorem 7.2.2 is non-constructive, 
and much of our work in the following sections of this chapter is concerned with 
the problem of constructing suitable approximations to semantic operators 
in the case of first-order programs.'* However, we will begin by discussing 
propositional programs in these terms in the next section. 


7.4 Propositional Programs 


The previous section delineates the problem we wish to study in this chap- 
ter, and we begin by studying the propositional case first relative to the imme- 
diate consequence operator. Before doing this however we note that networks 
yet simpler than those just described, namely, 2-layer feedforward networks 
of binary threshold units, do not in general suffice to compute the immediate 
consequence operator for (definite) propositional logic programs, although we 
give no details of this claim here.!? 


We now present the main result of this section.!4 


12We know of no constructive proof of Theorem 7.2.2 and refer the reader to the papers 
[Cybenko, 1989, Funahashi, 1989, Hornik et al., 1989] for well-known versions of the proof. 

13See [Hitzler et al., 2004] for a discussion of this fact. 

14This result was first established in [Hélldobler and Kalinke, 1994]; here, and in the rest 
of this section, we follow [Hitzler et al., 2004]. 
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7.4.1 Theorem For each propositional normal logic program P, a 3-layer 
feedforward network can be constructed which computes the immediate con- 
sequence operator Tp. 


Proof: Let m and n be the number of propositional variables and the number 
of clauses occurring in P, respectively. Without loss of generality, we may 
assume that the variables are ordered. The network associated with P can 
now be constructed by the following translation algorithm. 


(1) Both the input and output layers are vectors of binary threshold units of 
length m, where the i-th unit in either of these layers represents the i-th 
variable, 1 <i < m. The threshold of each unit occurring in the input or 
output layer is set to 0.5. 


(2) For each clause of the form A + L1,..., Lk, k > 0, occurring in P, do 
the following. 


(2.1) Add a binary threshold unit c to the hidden layer. 


(2.2) Connect c to the unit representing A in the output layer with weight 
L 


2.3) For each literal L;, 1 < j < k, connect the unit representing L; in 
j J 8 Lj 
the input layer to c and, if L; is an atom, then set the weight to 1; 
otherwise, set the weight to —1. 


(2.4) Set the threshold 6, of c to l — 0.5, where lis the number of positive 
literals occurring in L1,..., Dp. 


Each interpretation J for P can be represented by a binary vector 
(v1,..., Um). Such an interpretation is given as input to the network by exter- 
nally activating corresponding units of the input layer at time to. It remains 
to show that Tp(J)(A) = t if and only if the unit representing A in the output 
layer becomes active at time tọ + 2At. 

If Tp(1)(A) = t, then there is a clause A — Ly,...,L, in P such that 
for all 1 < j < k we have I(L;) = t. Let c be the unit in the hidden layer 
associated with this clause according to (2.1) of the construction. From (2.3) 
and (2.4) we conclude that c becomes active at time tp + At. Consequently, 
(2.2) and the fact that units occurring in the output layer have a threshold of 
0.5 (see Step (1) of the construction) ensure that the unit representing A in 
the output layer becomes active at time to + 2At. 

Conversely, suppose that the unit representing the atom A in the output 
layer becomes active at time tp + 2At. From the construction of the network, 
we find a unit c in the hidden layer which must have become active at time 
to + At. This unit is associated with a clause A — Iy,..., Lx. If k = 0, 
that is, if the body of the clause is empty, then, according to (2.4), c has 
a threshold of —0.5. Furthermore, according to (2.3), c does not receive any 
input, that is, pe = 0+0.5, and consequently c will always be active. Otherwise, 
if k > 1, then c becomes active only if each unit in the input layer representing 
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FIGURE 7.4: Two 3-layer feedforward networks of binary threshold units 
computing Tp, and Tp,, respectively. Only connections with non-zero weight 
are shown. The numbers occurring within units denote thresholds. 


a positive literal and no unit representing a negative literal in the body of 
the clause is active at time to (see (2.3) and (2.4)). Hence, we have found a 
clause A — L1,..., Lp such that for all 1 < j < k we have I(L;) = t, and 
consequently Tp(I)(A) = t. a 


7.4.2 Example As an example of Theorem 7.4.1, consider the following two 
programs P; (on the left) and P> (on the right): 


C<-—A,7AB Ac 
CWA, B C<-—A,AB 
C nA, B 


Their corresponding connectionist networks are shown in Figure 7.4. One 
should observe that P) exemplifies the representation of unit clauses in 3- 
layer feedforward networks. 15 


It is worth noting that the number of units and the number of connections 
in a network F corresponding to a program P are bounded by O(m + n) and 
O(m x n), respectively, where m is the number of propositional variables and 
n is the number of clauses occurring in P. Furthermore, Tp(I) is computed in 
two steps. As the sequential time to compute Tp(I) is bounded by O(n x m) 
(assuming that no literal occurs more than once in the conditions of a clause), 
the parallel computational model is optimal.'® 

We mention in passing and in the context of Theorem 7.4.1 that one can 
apply the Banach contraction mapping theorem, Theorem 4.2.3, to obtain the 
following result. 


7.4.3 Corollary Let P be a normal propositional logic program such that 


15We can save the unit in the hidden layer corresponding to the unit clause if we change 
the threshold of the unit representing A in the output layer to —0.5. 

164 parallel computational model requiring p(n) processors and t(n) time to solve a 
problem of size n is optimal if p(n) x t(n) = O(T(n)), where T(n) is the sequential time to 
solve this problem, see, for example, [Karp and Ramachandran, 1990]. 
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Tp is a contraction with respect to some (necessarily complete) metric. Then 
a 3-layer recurrent network can be constructed such that each computation, 
starting with an arbitrary initial input, converges and yields the unique fixed 
point of Tp or, in other words, yields the unique supported model for P. 


Indeed, there is even a kind of converse of Corollary 7.4.3 also, as follows. 
Let P be a propositional logic program such that the corresponding network 
has the property that each computation starting with an arbitrary initial input 
converges, and in all cases converges to the same state. Then it results that 
iteration of the Tp-operator exhibits the same behaviour, that is, for each 
initial interpretation it yields one and the same constant value after a finite 
number of iterations. This fact suffices to guarantee the existence of a complete 
metric which renders Tp a contraction, and the claim therefore follows.!” 

Returning to the programs P, and P again, we observe that the asso- 
ciated Tp-operators are contractions.'® Hence, Figure 7.4 shows the kernels 
of corresponding recurrent networks which compute the least fixed point of 
Tp, (the interpretation represented by the vector (0,0,0)) and of Tp, (the 
interpretation represented by the vector (1,0,1)). 

The time needed by the network to settle down into the unique stable state 
is equal to the time needed by a sequential machine to compute the least fixed 
point of Tp in the worst case. As an example, consider the definite program 
P as follows, where 1 <i<n 


Aj Saal 
Aii — A; 


The least fixed point of Tp, is the interpretation which evaluates each Aj, 
1 <i < n, to t, and it can be computed in O(n) steps.!? Obviously, the 
parallel computational model needs as many steps. More generally, let P be a 
propositional definite program containing n clauses. The time needed by the 
network to settle down into the unique stable state is 3n in the worst case, 
and thus, the time is linear with respect to the number of clauses occurring in 
the program. This comes as no surprise as satisfiability of propositional Horn 
formulae is P-complete and, thus, is unlikely to be in the class NC.7° On the 
other hand, consider the program P, containing the following clauses 


A; = 
Ai41 — A; 


17See [Hitzler and Seda, 2001, Bessaga, 1959, Jachymski, 2000]; a direct proof of this ob- 
servation is given in [Hdlldobler and Kalinke, 1994]. 

18These programs are actually acceptable, as can be seen by mapping C to 2 and A as 
well as B, to 1 and considering the model [(A) = I(C) = t and I(B) =f. 

19Using techniques described in [Dowling and Gallier, 1984] and [Scutella, 1990]. To be 
more precise, the algorithm described in [Dowling and Gallier, 1984] needs O(n) time, where 
n denotes the total number of occurrences of propositional variables in the formula. 

20See, for example, [Jones and Laaser, 1977] and [Karp and Ramachandran, 1990]. 
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where 1 < i < n and i is even. The least model for P} maps each atom to t 
and is computed in five steps by the recurrent network corresponding to P4. 

We note that the networks constructed by the translation algorithm pre- 
sented previously cannot be trained by the usual learning methods applied 
to connectionist systems. It was observed in [d’Avila Garcez et al., 1997] (see 
also [d’Avila Garcez and Zaverucha, 1999, d’Avila Garcez et al., 2002]) that 
results similar to Theorem 7.4.1 and Corollary 7.4.3 can be obtained if the 
binary threshold units occurring in the hidden layer of the feedforward kernels 
are replaced by sigmoidal units. We omit the technical details here and refer 
to the above cited literature. Such a move renders the kernels accessible to 
the backpropagation algorithm, a standard technique for training feedforward 
networks [Rumelhart et al., 1986]. 


7.5 First-Order Programs 


A central problem for neural-symbolic integration is the determination of 
a good representation of first-order rules within a connectionist setting. Such 
a representation would result, at least, in the computation or approximation 
of the associated semantic operators. That approximating networks exist for 
the immediate consequence operators of acyclic logic programs was the first 
result obtained in this regard, see [H6lldobler et al., 1999], but it was shown 
with the help of Funahashi’s theorem, which is non-constructive as we have 
already observed. In this section, we outline the ideas underlying the general 
problem and also discuss different constructive approaches to it. But before 
going into details, we need to answer the following questions. 


e Why do we need to approximate operators such as the Tp-operator? 
e What does approximation mean in our context? 


The first question is easily answered: even a single application of the Tp- 
operator can lead to infinite results. For example, assume P is a program 
containing the fact p(X). Applying the Tp-operator once (to an arbitrary 
interpretation) leads to a result containing infinitely many atoms, namely, all 
p(X)-atoms for every X. In this simple example, we might be able to represent 
this particular result in a finite way, but things might become arbitrarily 
complex for other programs using the same or similar representations.?! 


21Indeed, the so-called rational models were developed to tackle this representational 
problem for certain programs, see [Bornscheuer, 1996]. Unfortunately, there is no way to 
compute an upper bound on the size of this rational representation, and hence it does not 
give us any immediate advantages. Because we are not aware of any other finite represen- 
tation, we will concentrate here on the standard representation using Herbrand interpreta- 
tions. 


Logic Programming and Artificial Neural Networks 197 


In principle, there are two ways to approximate a given Tp-operator. On 
the one hand, we can design an approximating function to meet a given level 
of accuracy. This leads, as accuracy increases, to increasing numbers of units 
in the hidden layer in the resulting networks, and we call this method approx- 
imation in space. The approaches presented in this section follow this line of 
attack. Alternatively, we can construct a system which approximates a single 
application of the Tp-operator better and better the longer it runs, and we 
call this method approximation in time.?? 

Our discussion here has concentrated on the operator Tp, but all our con- 
siderations apply equally well to any of the other semantic operators we have 
studied, and we will return to this point in Sections 7.6 and 7.7. However, 
unless stated to the contrary, for a given normal logic program P, we will 
focus on the operator Tp and the space Ip of two-valued interpretations in 
Section 7.5.1 through to Section 7.5.6. 


7.5.1 Feasibility of the First-Order Approach 


As mentioned previously, it is well-known that multilayer feedforward net- 
works are universal approximators for certain real functions and, in particular, 
for all continuous real functions on compact subsets of R”. Hence, if we can 
find a suitable way of representing first-order interpretations by (finite vectors 
of) real numbers, say, then feedforward networks may be used to approxi- 
mate the meaning function of suitable programs. It is necessary of course that 
such representations are compatible with both the logic-programming and the 
neural-network paradigms. 


7.5.1 Program (Even2) We use the following variant of the program Even, 
Program 2.1.3, as a running example. The equations on the right define a level 
mapping l assigning odd numbers to even(s’(a))-atoms and even numbers to 
odd(s*(a))-atoms. 


even(a) — i(even(s‘(a))) = 2i+1 


even(s(X)) — odd(X) I(odd(s*(a))) := 2i + 2 
odd(X) — meven(X) 


We next define a homeomorphic embedding of the space of interpreta- 
tions of a given normal logic program into some (compact) subset of the real 
numbers. In doing this, we use level mappings?? to realize this embedding. 
For much of this chapter, although not everywhere, we assume that the level 
mapping in question is bijective, even though some of the results we discuss 
can be extended to the case of non-bijective level mappings.”4 


?2This method was employed in [Bader and Hitzler, 2004] and [Bader et al., 2005a]. 

?3We are following [Hélldobler et al., 1999] here. 

24See [Seda, 2006], for example, where the requirement on level mappings l : Bp — w is 
the already familiar one that 1—!(n) be a finite set for each n. 
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FIGURE 7.5: Transforming Tp into fp. 


7.5.2 Definition Let l: Bp — w be a bijective level mapping defined on the 
Herbrand base Bp of some normal logic program P, and let b be a natural 
number such that b > 2. We define a function ų on Ip by setting 


(1) = Sor 


AEI 


for each I € Ip. 


In fact, (I) gives a binary representation in the number system with base b 
to each interpretation J, and moreover ų¿ is an embedding of Ip into the number 
system with base b. It is straightforward to show that v is a homeomorphism, 
and it follows from Theorem 3.3.4 that not only is the set K C [0,1] of all 
embedded interpretations compact, but that it is also homeomorphic to the 
Cantor set whenever Ip is endowed with the Cantor topology. Using 1, we can 
construct the real-valued version fp = (Tp) of the immediate consequence 
operator Tp by defining fp(x) := u(Tp(u~'(a))) or, in other words, by forcing 
the diagram in Figure 7.5 to commute. 

Furthermore, since ų¿ is a homeomorphism, it follows that fp is contin- 
uous if and only if Tp is continuous in the Cantor topology on Ip. Now, 
using Funahashi’s result, Theorem 7.2.2, we can conclude that approximating 
networks exist for suitable programs, namely, those for which the immediate 
consequence operator Tp is continuous in the Cantor topology on Ip. 

Conversely, suppose that P is a normal logic program and that approx- 
imating networks exist for Tp. Then Tp must be continuous in the Cantor 
topology on Ip, and we have the following theorem.?° 


7.5.3 Theorem Suppose that P is a normal logic program. Then approxi- 
mating networks exist for Tp if and only if Tp is continuous in the Cantor 
topology on Ip. 


25See [Seda, 2006, Theorem 3.24]. In fact, the theorem just cited was established for 
Fitting-style operators (over finite truth sets, not just for two truth values). 
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FIGURE 7.6: The embedding of the T’p-operator for Program 7.5.1. 


Thus, at this point, we know that approximating networks exist for suitable 
normal logic programs, but we do not yet know how to construct them. This 
issue will be taken up in the following sections. 

Before discussing the constructions in detail, we will take a closer look 
at the space of embedded interpretations and at the embedding of the Tp- 
operator associated with Program 7.5.1. Using the embedding 1 defined above 
with b = 3 and taking the level mapping shown in Program 7.5.1, we obtain 
the embedding of the Tp-operator shown in Figure 7.6. As already mentioned 
earlier, the space Ip of interpretations is homeomorphic to the Cantor set. This 
can also be seen by looking at the domain of the graph shown in Figure 7.6. 


7.5.2 First-Order Programs by Propositional Approximation 


By completely grounding a first-order program P, that is, by forming the 
set ground(P), we obtain a de facto propositional version of it. In particular, 
the associated immediate consequence operators of P and of ground(P) are 
identical. Unfortunately, the ground version of most programs of interest turns 
out to be an infinite set. Nevertheless, it is a major point to make that we 
can approximate the immediate consequence operator of P by taking the 
immediate consequence operator of a subset of ground(P) instead, and we 
consider this process now. 

It will be helpful to say first a few words about the metrics which are 
useful in the process.? Suppose l : Bp — w is a level mapping,?” and form 
the metric dı induced by l, see Definition 5.1.3. Then we can define a metric 
à on the set of all mappings from Ip to Ip by?’ 


Af, g) = sup di(f(Z),9()), 


Ie€Ip 


for f,g : Ip — Ip. Similarly, we write |u(f) — e(g)| to denote the uniform 
metric sup,ex |L f)(x) — e(g)(x)| defined on the set of all functions mapping 
K into itself. Of course, the definition for A just given can be made generally 


26We refer the reader to [Seda, 2006, Section 3.1] for more details. 
271t is enough for l to satisfy the property that 1~1(n) is finite for each n. 
?8The supremum can be replaced by maximum if f and g are continuous. 
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and not just for dı, but this suffices for what we want to say here. Now, given 
a level n, we form the subset P, of ground(P) containing all those clauses 
whose heads have level < n. Then, for all A € Bp with L(A) < n and for all 
I € Ip, we have A € Tp, (J) if and only if A € Tp(J), or equivalently, by 
definition of dı, we have dj(Tp,(I),Tp(1)) < 2~+” for all I € Ip. Hence, 
A(Tp,, Tp) < 2~+). Now suppose that € > 0 is given. Choose n € N so 
large that X; „n b7’ < £, and form Pa. Then for all I € Ip, Tp, (I) and Tp (T) 
agree on all atoms A with /(A) < n. Therefore, the expansions 1(Tp, (I)) and 
i(Tp(1)) agree in their first n terms. Hence, for all I € Ip we have, from 
Figure 7.5, that 


fe, (UD) = fPD) = Te, 2) = (Te) < e. 


In other words, given any £ > 0, we obtain the approximation | fp, — fp| < € 
provided n is sufficiently large. In addition, approximation can be thought of in 
terms of d; and À at the level of interpretations and of Tp itself independently 
of the embedding v chosen. We refer to this process of working with P, as 
approximating Tp up to level n, and we will see shortly that it can be used to 
show that approximating networks exist for Tp for certain programs P. Indeed, 
in this terminology the estimates just made show that Tp, approximates Tp 
up to £ provided Tp, approximates Tp up to level n for large enough n. 
Unfortunately, the subsets P, of ground(P) which, as we have just seen, 
are appropriate for approximation can be infinitely large. For example, there 
are infinitely many ground instances of the clause a — p(X). Therefore, we 
consider only so-called covered logic programs in the rest of this section, ex- 
cluding Section 7.5.6, and we define the notion of a covered program next. 


7.5.4 Definition A logic program is called covered if it has no local variables, 
that is, if every variable symbol occurring in the body of a clause also occurs 
in the head of the same clause. 


7.5.5 Proposition Let P be a covered logic program, let l be a bijective level 
mapping from Bp to w, and let n € w be fixed. Then the program P,, defined 
above by 


Pa := {C | C € ground(P) with l(H) < n, where H is the head of C} 
is finite. 


Proof: The finiteness of P, follows directly from the fact that, for a given 
level m, there is at most one ground clause C whose head has level m. a 


Using this finiteness property, we can directly obtain the following theorem 
showing the existence of approximating networks for a given covered logic 
program. 
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7.5.6 Theorem Let P be a covered logic program, and let n € N. Then we 
can construct a 3-layer feedforward network whose network function approx- 
imates Tp up to level n. 


Proof: We can obtain such an approximating network by 
(1) Constructing P,, as defined above. 


(2) Using the construction presented in the proof of Theorem 7.4.1 to obtain 
a network computing Tp, . 


Since Tp, coincides with Tp for all atoms of level < n, we conclude that the 
network we have constructed approximates Tp up to level n, as required. W 


7.5.7 Example Take P to be Program 7.5.1 introduced earlier. We obtain 
the corresponding program P,, by means of the level mapping defined in Pro- 
gram 7.5.1. The level of the head atom of the clauses is shown below on the 
right. 


P, = {even(a) —} i(even(a)) = 1 

P, = {even(a) —, l(even(a)) = 1 
odd(a)  —even(a)} l(odd(a)) = 2 

P; = {even(a) —, l(even(a)) = 1 
odd(a) — ~even(a), l(odd(a)) = 2 
even(s(a)) — odd(a)} l(even(s(a))) = 3 


The corresponding networks are shown in Figure 7.7. 


7.5.3 Approximation by Sigmoidal Networks 


In this section, we take a different approach to the approximation of the 
embedded meaning function. We start by presenting the underlying intuitions 
and continue with a detailed discussion.?9 

Using the embedding v defined earlier for b = 3 and the level mapping 
shown in Program 7.5.1, we obtain the embedding of the Tp-operator shown 
in Figure 7.8 on the left. Under the condition that P is covered and the 
level mapping l is bijective, we can approximate this graph using a set of 
appropriately chosen constant pieces. These, in turn, can be computed as a 
sum of threshold functions, shown in Figure 7.8 in the middle. By replacing 
the threshold functions by sigmoidals, we obtain an approximation which can 
directly be implemented within a neural network. 


29The interested reader is referred to [Bader et al., 2005b] and [Bader, 2009] for further 
details and for implementations. 
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FIGURE 7.7: The networks corresponding to P,, Pj, and P} from Exam- 
ple 7.5.7. 
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FIGURE 7.8: The embedding of the Tp-operator of Program 7.5.1 is shown 
on the left. In the middle and on the right, approximations using threshold 
and sigmoidal functions are depicted. 
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1. Approximate the embedded Tp-operator using constant pieces. As be- 
fore, we start by constructing P, for a given level n. After embedding 
the approximating operator Tp,, we find that the resulting function is 
a piecewise constant function. Due to the finiteness of the resulting pro- 
gram, we obtain the greatest relevant input level by taking the maximal 
level of an atom occurring in any of the bodies. Since no atom of a 
greater level influences the result of the T’p-operator, we see that it is a 
piecewise constant function. 


2. Approximating the embedded Tp-operator using threshold functions. Ob- 
viously, every piecewise constant function R — R can be represented as 
a sum of (parametrized) threshold functions. To approximate the em- 
bedded Tp-operator of Program 7.5.1 up to level 3, we need the three 
functions: H§-619, Ho 7°, HOL, where H¥(x) := y- H(x — p) denotes 


an h-step at position p. 


3. Approximating the embedded Tp-operator using sigmoidal functions. To 
enable the construction of sigmoidal networks, we need to replace the 
threshold functions with sigmoidal functions. This can be done because 
(a) we are only interested in the approximation of embedded interpre- 
tations, and (b) we can place the threshold functions so that the jumps 
are located between two embedded interpretations. First, we construct 
the threshold approximation not for the greatest relevant input level 
n as introduced earlier, but up to level n + 1. Every approximation of 
this function up to e’ := b-+) results in a sufficient approximation 
of the embedded Tp-operator. Under these conditions, we can replace 
the threshold functions by appropriately set up sigmoidal functions. We 
just need to make sure that the sigmoidal functions approximate the 
threshold functions on all embedded interpretations up to e’. For the 
example of Program 7.5.1, see also Example 7.5.7, we obtain the follow- 


A ions: 50-016 —0.078 0.016 

ing sigmoidal functions: 59°949,135.9947 90.167,53.8647 90.992,135.9947 Where 
h Jz h 

Sp, (£) T [4+e-s(@-P)* 


4. Approximating the embedded Tp-operator using a sigmoidal network. 
The approximating sigmoidal functions constructed in Step 3 can easily 
be embedded into a standard 3-layer sigmoidal network as follows: the 
input and output layer contain exactly one unit computing the identity 
function. The hidden layer contains a sigmoidal unit for every sigmoidal 
function constructed in Step 3. The weights from input to hidden layer 
are set up such that they represent the steepness of the constructed sig- 
moidal. The thresholds of the hidden layer correspond to the locations 
of the sigmoidal functions, and the weights from hidden to output layer 
coincide with the step width of the underlying threshold functions. 


Figure 7.9 shows the resulting network for € = 0.04 corresponding to an 
approximation of the T’p-operator up to level 3. 
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FIGURE 7.9: An approximating sigmoidal network for Program 7.5.1. 


We are now in a position to state the following theorem.°° 


7.5.8 Theorem Let P be a covered logic program, let b > 2, and let £ > 0. 
Then we can construct a 3-layer feedforward sigmoidal network whose network 
function approximates Tp up to €. 


Both approaches presented in the last two sections are based on a subset of 
ground(P) and embedding the approximated Tp-operator. While the approach 
presented in Section 7.5.2 creates an input and output unit for every ground 
atom, we created just a single unit here. Thus, to increase the accuracy of 
the network we simply have to add a unit to the hidden layer, but the input 
and output layers can be kept unchanged. Unfortunately, using only a single 
unit limits the overall accuracy once the network is implemented on a real 
computer. 


7.5.4 Approximation by Radial-Basis-Function Networks 


Radial-basis-function (RBF) networks are another common neural network 
architecture. As in the case of sigmoidal networks, they are known to be 
universal approximators for continuous functions on compact subsets of R”. 
An RBF network consists of three layers: the input, hidden, and output layers. 
The activation of units in the input layer is set from outside. But in contrast to 
the networks discussed so far, the hidden units do not compute the weighted 
sum, but compute the distance between the vector of input unit activations 
and the weight vector of the corresponding connection. That is, the potential of 
unit k with ng incoming connections is computed as p(t) = m(v, wk), with m 
denoting a metric over n-dimensional vectors, Y denoting the vector of input 
unit activations, and wz, denoting the vector of weights of the connections to 
unit kg. Usually, the Euclidean distance between the two vectors is used as 
the distance function m. 


30The proof and all details of the construction involved in this result can be found in 
[Bader, 2009]. 
31 Good introductions to them can be found in [Rojas, 1996] and [Bishop, 1995]. 
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FIGURE 7.10: The raised cosine activation function and an approximation 
of the embedded Tp-operator of Program 7.5.1 using raised cosine activation 
functions. Each constant piece is represented using two raised cosines. 


In the constructions below, we use the raised cosine function (see Fig- 
ure 7.10) to compute the activation of the hidden units: 


h = : 
rcos”,,: RoR: 2h 7 (lees (oF) = | 
i 0 otherwise. 


Note that if two raised cosines rcos/ „ and ros? „ with |p; — p2| = w are 


added, we obtain a function that is constant on the interval [p1 , p2]. Therefore, 
we can represent each constant piece from above by two raised cosines. Fig- 
ure 7.10 shows the approximation of the Tp-operator for our running example. 

As above, the approximation by raised cosines can easily be implemented 
using an RBF network. The resulting network contains a single input and 
output unit serving as interface. Every raised cosine necessary for the approx- 
imation is computed by a single hidden unit. The weight between the input 
and the hidden layer contains the position, and the weight between the hidden 
and the output unit represents the height of the function. Figure 7.11 shows 
the RBF network for Program 7.5.1. Using these insights, we can state the 
following theorem, again without proof. 


7.5.9 Theorem Let P be a covered logic program, let b > 2, and let £ > 0. 
Then we can construct an RBF network whose network function approximates 
Tp up to €. 


Unfortunately, the two approaches discussed in Sections 7.5.3 and 7.5.4 
only allow for limited accuracy when implemented on a real computer. This is 
due to the fact that a single unit is used in the input layer and in the output 
layer. Even though we can assume unlimited accuracy of real number opera- 
tions in theory, we cannot assume this when using a computer. To overcome 
this drawback, we discuss another approach in the following section. 


206 Mathematical Aspects of Logic Programming Semantics 


FIGURE 7.11: An RBF network approximating the Tp-operator of Pro- 
gram 7.5.1. 
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FIGURE 7.12: A two-dimensional version of the Cantor set obtained by em- 
bedding all interpretations using a two-dimensional bijective level mapping. 
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FIGURE 7.13: A construction of the two-dimensional version of the Cantor 
set. 


7.5.5 Approximation by Vector-Based Networks 


The approaches presented above are based on level mappings with co- 
domain w. Here we extend this approach to multi-dimensional level mappings, 
which permits the embedding of interpretations into vectors of real numbers. 
An n-dimensional level mapping is a function L : Bp > w x {1,...,n}, that 
is, to each atom A we assign a level £;(A) € w and some dimension Lg(A) € 
{1,...,n}. As above, we assume a bijective level mapping. On embedding 
interpretations into n-dimensional real vectors, we obtain an n-dimensional 
version of the classical Cantor set. A two-dimensional version is shown in 
Figure 7.12. 


Unfortunately, the results obtained so far cannot be extended to the n- 
dimensional case, at least we do not know how to make such an extension. 
But nevertheless we can construct approximating networks employing certain 
knowledge that we have about the set of embedded interpretations. Figure 7.13 
shows a possible way of constructing the two-dimensional Cantor set. Starting 
from a square, in every iteration the current version is copied and scaled 
down four times. Afterwards, the four copies are placed in the corners. The 
squares occurring in the n-th step of the construction are referred to below as 
hypercubes of level n. 


As for the one-dimensional case, the Tp, -operator turns out to be a piece- 
wise constant function. Let P, be as previously defined, and let ñ be the 
maximal level of a body atom in Pp. Then the operator Tp, is constant on all 
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FIGURE 7.14: A depiction of the approximation of the Tp-operator for Pro- 
gram 7.5.1 using a vector-based network. 


those interpretations which agree on all atoms up to level ñ, and those areas 
coincide with the hypercubes of level 7.3? 

Vector-based networks? can be thought of as a generalization of the so- 
called self-organizing maps.°* A number of units are distributed over the input 
space. For every input given to the network, the closest unit is selected as the 
winner unit. The winner’s activation is set to 1, and the activation of all other 
units is set to 0. Thus, only the winner influences the output of the network. 

By setting up a network such that there is a unit for every hypercube 
of level ñ, we can directly embed the Tp,-operator into the weights of the 
connections from those units to the output units. Figure 7.14 shows what such 
a network for the one-dimensional case could look like.*° For every hypercube 
(coinciding with intervals in the one-dimensional case) a unit is added to the 
network. The weights between the input and hidden layers define (as for RBF 
networks) the location of the unit, and the weights between the hidden and 
output layers define the output, that is, the value of the embedded Tp-operator 
for an interpretation within the input area of the unit. As before, we are now 
in a position to state a theorem asserting the existence of approximating 
vector-based networks, as follows. 


7.5.10 Theorem Let P be a covered logic program, let b > 2, and let € > 
0. Then we can construct a vector-based network whose network function 
approximates Tp up to €. 


By using an m-dimensional level mapping, we fix the network to have 
m input and m output units. That is, we can increase the accuracy of the 
network by using more units. Unfortunately, the number of hidden units grows 
exponentially with the dimension of the input layer. Nonetheless, we are now 
in a position to trade accuracy against space, which has not been possible 
before. 


32See [Bader, 2009] for details. 

33See [Martinetz and Schulten, 1991, Fritzke, 1998] for further details. 

34/Kohonen, 1981, Haykin, 1994]. 

35The n-dimensional case for n > 1 is hard to depict because the graphics need to be 
(n + 1)-dimensional. 
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Just as for the network architectures described previously, we can train 
vector-based networks using a set of input-output pairs. The position of the 
units, that is, the weights between the input and hidden layer, are modified 
such that a unit is located in the centre of all the inputs it is responsible for. 
The output weights are trained such that they represent the average output 
of all inputs within the unit’s responsibility. If, furthermore, two neighbouring 
units have similar output weights, then one of them can be removed because 
the other unit will take over in that eventuality. A unit whose accumulated 
error is very large can be replaced by two units that can be adapted indepen- 
dently, thus allowing the network to refine its input-output function in certain 
areas. 

The first experiments which reported on this approach? showed the ap- 
plicability of this learning method in the area of neural-symbolic integration. 
A randomly initialized network was trained using the embedded versions of 
an interpretation J as input values and of Tp(J) as output values for a given 
program P. The network learned the mapping and could be used iteratively 
by adding recurrent connections between the output and input layers. 


7.5.6 Approximating the (Least) Fixed Point of Tp 


Thus far, we have discussed at some length the issue of the approximate 
computation of the Tp-operator for first-order normal logic programs P. We 
turn now to discussing, fairly briefly, the question of the approximate compu- 
tation of its fixed points. One approach is to carry forward the work of the 
previous sections and employ iterates of (recurrent) neural networks which 
approximate Tp to approximate iterates of the operator Tp, but, as already 
noted earlier, the problem then emerges of uniformly controlling the error 
estimates under iteration.’ 

On the other hand, one can approach the problem of computing the least 
fixed point of Tp for arbitrary definite logic programs P by a modification of 
the previous approach employing the subset P,, of ground(P), except that we 
do not assume that P is covered, and instead we ensure that the appropriate 
subset of ground(P) is finite by other means. 

Thus, let P denote an arbitrary (first-order) definite logic program, and 
denote by I the least fixed point of Tp. Let |: Bp — w be a level mapping 
with the property that [~1(n) is a finite set for each n € w. We proceed to 
sketch the details of the construction of a finite subset Pn of ground(P), where 
n is a given natural number, which will play the sort of role here that P,, plays 
in Proposition 7.5.5 and its companion results.°° We start with the following 
claim. 


36See [Bader et al., 2007]. 

37This point is discussed in [Hitzler et al., 2004, Section 4.3], but quite strong conditions, 
for example, Lipschitz continuity [Hitzler et al., 2004, Theorem 4.19], are required for things 
to work satisfactorily. 

38See [Seda, 2006] for full details. 
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Claim. Suppose that A € Tp fT k. Then there is a clause A — body in 
ground(P) such that A does not occur in body and Tp Î (k — 1) H body. 

To establish this claim, we first note that it is clear that k > 1. Suppose 
that A € Tp T ko = Tp(Tp Î (ko — 1)) and that ko is the smallest natural 
number with this property. Then there is a clause A — body in ground(P) 
such that Tp f (ko—1) — body. By definition of ko, we have A ¢ Tp Î (ko—1), 
and hence A does not occur in body. Finally, by monotonicity, we obtain that 
Tp Î (k — 1) H body, as required. 


Since P is definite, we have 


Tp TOC Tetee Tfn cci EL Te Tn 


where Tp Î n denotes the n-th upward power TR(Ø) of Tp, as usual. 


Given n € N, there are only finitely many atoms Aj, Ao,...,Am E I 
with 1(A;) < n for i = 1,...,m, and, by directedness, there is (a smallest) 
k = k, € N such that A), A2,..., Am € Tp T kn.3? Consider the atom Aj, 
where 1 <i < m, and the following three steps. 


(1) We have A; € Tp Ù kn = Tp(Tp ft (kn — 1)). Therefore, there is a 
clause 


A; = A1(1),..., AZO (1) 


in ground(P) such that A1(1),..., A ® (1) € Tp 1 (kn — 1). Note that this 


i 
clause may be a unit clause, that is, m(i) > 0, and there may be many such 
clauses with head A;; we choose one of them. 


(2) Because A}(1),...,A”(1) € Tp T (kn —1) = Tp(Tp 1 (kn — 2)), 
there are clauses in ground(P) as follows. 


AN) = Aly (2)... ARI?) 
AR (1) — Ala(2),--,Ag2 O) 
— ; 
Am (1) — Aima l2) i A a; 
where each of the atoms Af ;(2) in each of the bodies belongs to Tp | (kn — 2). 


(3) Because each of the Aj ;(2) in Step (2) belongs to Tp | (kn — 2) = 
Tp(Tp Î (kn — 3)), we have a finite collection of ground clauses (one for each 


39Notice that, depending on l, there may be no atoms A with L(A) < n; this case is 
handled by the abuse of notation obtained by allowing m to be 0. 
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of the Aj ;(2) in Step (2)) as follows. 


Aj (2) ae A} (3), ee Ae) 
Aj (2) == Aj 2(3), ies A) 
ae 
rae 0) cone coer en EEE ane A) 


AL, (2) — Ab 1(3),..6, AMG?) (3) 
A? (2) — A} > 2(3), T Ar aE) 


ae : 
ASID = Abana e-em BD 
= ; 
Ai m(i)(2) — Ai m(i),1(8); ek An mp 
A iO A A ga e A a) 
a 


a POE A nanen Ee A Ae mney, 
where each atom in each body belongs to Tp 7 (kn — 3). 

Note that at each stage in this process we select a ground clause in which 
the head of the clause does not occur in the body by means of the claim 
established earlier. 

This process terminates producing unit clauses in its last step. Let P; n 
denote the (finite) subset of ground(P) consisting of all the clauses which 
result; it is clear that Tp,,, 1 kn consists of the heads of all the clauses in 
Pin. We carry out this construction for i = 1,...,m to obtain programs 
Pin,- -, Pm n such that, for i = 1,...,m, Tp, (Tp,,, T kn) = Tp,n T kn 
(indeed, Tp, „Î kn is the least fixed point of Tp, „ by Kleene’s theorem, The- 
orem 1.1.9), A; € Tp, „ fT kn, and Tp,„ Tr C Tp tr CJ for all r €N. Let 
P,, denote the program Pi n U ... U Pin». Then P, is a finite subprogram 
of ground(P), and Tp, „ T kn C Tp, T kn C Tp Î kn C I for i =1,...,m. 
Furthermore, A1, ..., Am € Tp, T kn, and Tp, J kn is the least fixed point In 
of Tp, ` 

This completes the construction of the program Pp. 


7.5.11 Example We illustrate the process just described with k = k, = 3. 
Suppose that A; € Tp | 3 = Tp(Tp 7 2). Then there is a ground clause A; — 
Bı, Bo, say, with B1, B2 € Tp | 2 = Tp(Tp 7 1). Therefore, there exist ground 
clauses By — Cy,C2,C3 and B2 —, say, with C1, C2,C3 E€ Tp T1 = Tp(0). 
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It follows that there are unit clauses C1 —, Cp —, and C3 < in ground(P). 
Thus, P; is the program 


Cı — 

C2 — 

C3 — 

B — 

Bı — C1, C2, C3 
A, — Bı, B2 


Then we have the following calculations: Tp, | 0 = 0, Tp, ? 1 = Tp, (0) = 
{ Bo, C1, C2, C3}, Tp, T 205 Tp, ({ Bo, C1, C2, C3}) = {B,, Bo, C1, C2, C3}, 
Tp, T 3 = {A1, Bi, Bo, C1, C2, Cs}, and Tp, T 4 = Tp, (Tp, ij 3) = 
{41, Bi, Bo, Ci, C2, C3} = Tp, T 3. Thus, Tp, T 3 is a fixed point of Tp, 
and indeed is the least such fixed point. Moreover, A; € Tp, Î 3. 


Further properties of P„ can be found in [Seda, 2007]. 

Now let € > 0 be given and choose n so large that 27” < e. Then 
di(In, I) < 27” < £, where Tn is the least fixed point of Tp, T is the least fixed 
point of Tp, as noted above, and d; is the metric associated with l. Now apply 
the algorithm of Theorem 7.4.1 to the propositional program P, and make 
the resulting network F,, (which computes Tp) recurrent. On inputting the 
interpretation Ø to this network and iterating n times, we obtain J, as output. 
Thus, F„ approximates I up to £, and in this sense the family {Fn | n € N} 
computes I. 


7.5.12 Example Take P to be as in Example 3.2.3, that is, the program 


pla) — 
p(s(X)) — p(X) 
Applying the procedure above to P, we obtain a sequence Fn of 3-layer feed- 


forward recurrent neural networks which computes the least fixed point of Tp 
and hence computes the set of natural numbers. 


7.6 Some Extensions — The Propositional Case 


So far in this chapter, we have concentrated on the operator Tp. However, 
in this section and the next we want to briefly consider extensions of our results 
to other operators and hence to other semantics. In the present section, we will 
focus on propositional normal logic programs P and extensions of the results of 
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Section 7.4. In particular, we consider extensions of Theorem 7.4.1 to Fitting- 
style operators Fp, including of course the special cases of ®p for Kleene’s 
strong three-valued logic and the corresponding operator Vp for Belnap’s 
logic FOUR. In the next section, Section 7.7, we will consider extensions 
of Section 7.5.2, or in other words we will consider approximations of local 
consequence operators, including Fitting-style operators, and the Gelfond- 
Lifschitz operator. 

In fact, one can adopt an algebraic approach to the material presented in 
this section at little extra cost, but with the benefit that the results apply 
to constraint logic programs (with constraints belonging to a given semiring) 
and to logic programs involving uncertainty expressed via many-valued logics, 
as well as to conventional logic programs. We shall not do that, however, 
as it would take us too far afield, requiring a definition of logic programs 
allowing elements of an abstract set (the set C in the next definition) in clause 
bodies and a corresponding new definition of Fitting-style operators. Instead, 
we content ourselves with sketching the development for conventional logic 
programs.*° Nevertheless, we will present the material in full generality where 
it helps, ultimately specializing to logics 7. Thus, we next present one of the 
main definitions we need in full generality, as follows. 


7.6.1 Definition Suppose that C is a set equipped with a binary operation ©. 
We say that © is finitely determined or that products (relative to ©) are finitely 
determined in C if, for each c € C, there exists a countable (possibly infinite) 
collection {(R", E?) | n € J} of pairs of sets R? C C and E? CC, where each 
R% is finite, such that a countable (possibly infinite) product ©,<,, ci in C is 
equal to c if and only if for some n € J the following statements hold. 


(1) R? C {c; |i € M}. 


(2) For all i € M, ci ¢ E}, that is, {c; | i © M} C (E7)°°, where (E7) 
denotes the complement of the set E”’. 


We call the elements of E? excluded values for c, we call the elements of 
A? = (E")@ allowable values for c, and in particular we call the elements of 
R? required values for c; note that, for each n € J, we have R? C A”, so 
that each required value is also an allowable value (but not conversely). More 
generally, given c € C, we call s € C an excluded value for c if no product 
Oiem ĉi With Oey Gi = € contains s, that is, in any product ©), ci whose 
value is equal to c, we have c; = s for no i € M. We let Ee denote the set of 
all excluded values for c, and let A, denote the complement (F,)°° of Ee and 
call it the set of all allowable values for c. Note finally that when confusion 
might otherwise result, we will superscript each of the sets introduced above 


40For full details of the sketch we present here, the reader should consult the following 
papers: [Seda and Lane, 2005], [Lane and Seda, 2006], [Komendantskaya et al., 2007] and 
also [Lane and Seda, 2009]. 
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with the operation in question. Thus, for example, AF denotes the allowable 
set for c relative to the operation ©. | 


In particular, we can take C as a logic 7T and © as either disjunction or 
conjunction defined on it. Indeed, the following example and the paragraph 
following it show the thinking behind Definition 7.6.1, and in fact we shall take 
FOUR as a running example throughout this section. Note that, throughout 
this section, we take FOUR to be the set {u,f,t, b} with this given listing of 
its elements, as in Chapter 1. 


7.6.2 Example Consider again Belnap’s logic FOUR. Taking © to be dis- 
junction V, the sets E and R are as follows. 


(1) For u, we have n = 1, EY = {t,b}, and RY = {u}. 

(2) For f, we have n = 1, Ey = {u,t,b}, and RY = {f}. 

(3) For t, n takes the values 1 and 2, EY = 0, RY" = {t}, and RY? = {u,b}. 
(4) For b, we have n = 1, EY = {u,t}, and RX = {b}. 


Thus, for example, a countable disjunction V;epmsi takes value t if and only 

if either (i) at least one of the s; takes value t or (ii) at least one of the s; 

takes value b and at least one takes value u; no truth value is excluded. 
Now taking © to be conjunction A, the sets E and R are as follows. 


1) For u, we have n = 1, Ef = {f, b}, and RÂ = {u}. 


2) For f, n takes the values 1 and 2, Ef = 0, Re = {f}, and Re = {u,b}. 


(1) 
(2) 
(3) For t, we have n = 1, Ef = {u,f, b}, and Rẹ = {t}. 
(4) For b, we have n = 1, Ef = {u,f}, and Ri = {b}. 


In fact, Definition 7.6.1 was motivated by the problem, already mentioned, 
of defining truth values of bodies of pseudo-clauses over various three-valued 
logics, see [Hitzler and Seda, 1999b] and Sections 5.2.1 and 5.5 herein. The 
following facts show how it works, where we take the countable set M to be 
N without loss of generality. If © is finitely determined, then it is idempotent, 
commutative, and associative, as already noted in Section 5.5. Furthermore, 
if Gien Si = C, then the sequence s1, s1 © $2, 81 © s2 © s3,... is eventually 
constant with value c. In the converse direction, suppose C is a countable set 
and © is idempotent, commutative, and associative. Suppose further that, 
for any set {s; | i € M} of elements of C where M is countable, the sequence 
S1, 510852, S152083, . . . is eventually constant with value c. Then all products 
in C are (well-defined and) finitely determined, where we take Qem Si = € 
to define Qem Si- 

For a finitely determined binary operation © on C, we define the partial 
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order <o on C by s <o tif and only if sOt = t. (So that s <, t if and only if 
s+t=t, and s <x tif and only if s x t = t, for finitely determined operations 
+ and x, and similarly for finitely determined operations of disjunction V and 
conjunction A in case C is a logic T.) 


7.6.3 Example In FOUR, we have t <a u <, f, and t <a b <, f. Also, 
f<yu<y,t, and f <y b <,\t. 


In fact, the allowable and excluded sets for s € C can easily be characterized 
in terms of the partial orders just defined: s € A® if and only if s <o t, see 
[Seda and Lane, 2005, Proposition 3.10]. Because of this fact, we have the 
following result. 


7.6.4 Proposition Suppose that © is a finitely determined binary operation 
on C and that M is a countable set. Then a product Ore mti evaluates to 
the element s € C, where s is the least element in the ordering <o such that 
ti € A® for alli € M. 


Having now determined how we evaluate the truth values of the bod- 
ies of pseudo-clauses in relation to Fitting-style operators Fp, we move 
next to consider the computation of these operators by neural networks in 
the case of propositional normal logic programs P. Indeed, it is shown in 
[Lane and Seda, 2006] that one can construct conventional 3-layer feedforward 
networks to compute ®p and Wp containing only binary threshold units, in 
the style of Theorem 7.4.1.4! However, extending this approach to the general 
case of Fp is not so simple, as the constructions become overly complicated. 
Therefore, we will adopt a modular approach in which we construct two types 
of 2-layer neural networks of binary threshold units. The first of these (the 
multiplication unit) will compute products or conjunctions of elements of C, 
and the second of them (the addition unit) will compute sums or disjunc- 
tions of elements of C. It then remains to construct 3-layer neural networks 
to compute Fp in which the hidden layer consists of multiplication units and 
the output layer consists of addition units; strictly speaking, these networks 
have five layers of course. In this context, it is worth noting that the partial 
ordering <o, defined previously, and Proposition 7.6.4 play a crucial role in 
establishing the results we discuss here. 

For the rest of this section, we shall focus on finite sets C with n elements 
listed in some fixed order, C = {c1,C2,...,Cn} or C = {t1,t2,..., tn}, say. In 
order to simulate the operations in C by means of neural networks, we need 
to represent the elements of C in a form amenable to their manipulation by 
neural networks. To do this, we represent elements of C by vectors of n units, 
and it is convenient sometimes to view them as column vectors, where the first 
unit represents c1, the second unit represents c2, and so on. Hence, a vector of 


41See the thesis [Kalinke, 1994], where these results are stated. We thank S. Hélldobler 
for drawing this reference to our attention. 
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u f t b 
1-—0.5 1—0.5 1—0.5 1—0.5 


FIGURE 7.15: A conjunction unit for FOU R. The full arrows represent con- 
nections with weight 1, and the broken arrows represent connections with 
weight —1. 


units with the first unit activated, or containing 1, represents c1, a vector with 
the second unit activated, or containing 1, represents c2, etc. Indeed, it will 
sometimes be convenient to denote such vectors by binary strings of length n 
and to refer to the unit in the i-th position of a string as the i-th unit or the 
c-unit or the unit c;; as is common, we represent these vectors geometrically 
by strings of not-necessarily adjacent rectangles. Note that we do not allow 
more than one unit to be activated at any given time in any of the vectors 
representing elements of C, and hence all but one of the units in such vectors 
contain 0. Furthermore, when the input is consistent with this, it turns out 
from the constructions we make that the output of any network we employ is 
consistent with it also. 


7.6.5 Example Suppose that C = FOUR = {u,f,t,b}. Then u is repre- 
sented by 1000, f by 0100, t by 0010, and b by 0001. 


In general, the operations in C are not linearly separable, and therefore 
we need two layers to compute addition (or disjunction) and two to compute 
multiplication (or conjunction). As usual, we take the standard threshold for 
binary threshold units to be 0.5. This ensures that the Heaviside function H 
outputs 1 if the input is strictly greater than 0, rather than greater than or 
equal to 0. 


7.6.6 Definition A multiplication (x) unit or a conjunction (A) unit MU 
for a given set C is a 2-layer neural network in which each layer is a vector of 
n binary threshold units c,,c2,...,Cn corresponding to the n elements of C. 
The units in the input layer have thresholds l — 0.5, where | is the number of 
elements being multiplied or conjoined, and all output units have threshold 
0.5. We connect input unit c; to the output unit c; with weight 1 and to any 
unit cj in the output layer, where c; <x cj, with weight —1. 


An input layer representing a product of l elements of C is connected to 
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a multiplication unit MU in the following way. For each element c of the 
product, where c is represented by the n units c1,c2,...,Cn, the unit cj is 
connected, with weight 1, to the c;-unit in the input layer of MU and is also 
connected, with weight 1, to any unit cy, in the input layer of MU for which 
Cj <x Ck. For a negated element d = =c in the product, we connect, with 
weight 1, cj to the unit representing =c; in the input layer of MU and also, 
with weight 1, to any unit cp in the input layer of MU for which 7c; <x cx. 


7.6.7 Proposition A multiplication or conjunction unit MU computes the 
value of a product of l elements of C when it is connected to an input layer 
as just described. 


7.6.8 Example Consider again C = FOUR, and input the two elements 
u and b to a multiplication unit MU, where | = 2. It is readily checked 
that the potentials of the units u, t, and b in the input layer of MU are, 
respectively, —0.5, —1.5, and —0.5; their outputs are all equal to 0; and the 
outputs of the units u, t, and b in the output layer of MU are also all equal 
to 0. On the other hand, the f-unit in the input layer of MU has potential 
1x1l+1x0+1x041x041x0+4+1x041x041x1-—15=0.5, and 
therefore the output of this unit is H(0.5) = 1. Furthermore, the input to the 
f-unit in the output layer of MU is —1 x0+1x1—-1x0-—1x0O=1. Hence, 
the output of this unit is H(1 — 0.5) = 1, and so MU outputs 0100 or f, and 
this indeed is the value of u ^ b, as required. 


The ideas behind multiplication units work, with minor changes, for addi- 
tion or disjunction, and we obtain addition (+) or disjunction (V) units AU 
which compute the sum or disjunction of k, say, elements of C. 

We are now in a position to state the main theorem of this section, where 
we take the set C to be a logic 7 endowed with the operations of disjunction 
(V) and conjunction (A). 


7.6.9 Theorem Suppose that both operations of disjunction and conjunction 
in 7 are finitely determined and that P is a propositional logic program 
defined over 7. Then we can construct a 3-layer feedforward neural network 
F which contains multiplication units in its middle layer and addition units 
in its output layer such that F computes Fp. 


In closing this section, we mention that there is yet another class of logic 
programs one can consider in our present context of extending Theorem 7.4.1, 
namely, the class of propositional annotated (bi)lattice-based logic programs. 
This class is also a very general class of programs capable of handling uncer- 
tainty, in this case using lattices and bilattices to model belief estimates for 
and against a proposition. However, its study would take us too far from our 
current goal, and instead we refer the reader to [Komendantskaya et al., 2007] 
again for full details. 
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7.7 Some Extensions — The First-Order Case 


So far, in this chapter we have described how certain methods developed 
in earlier chapters give rise to approaches to the problem of integrating logic 
programs and artificial neural networks. The key insight into this integration 
is the observation that the two paradigms can be formally related by means of 
functions: on the one hand, semantic operators for logic programs capture the 
meaning of logic programs; on the other hand, the input-output function of 
an artificial neural network completely characterizes its functional behaviour. 
Approaches to neural-symbolic integration thus arise out of methods which 
allow us to understand semantic operators as I/O functions of artificial neural 
networks, and vice-versa. 

Most of this chapter has focused on the single-step operator in logic pro- 
gramming which, via its fixed points, determines the supported model seman- 
tics of logic programs. However, in Section 7.6, we have just seen that some of 
these methods carry over to other semantics, for propositional programs, via 
the computation of Fp. In this section, we will now consider the first-order 
case and extensions of the approximation results we have established for Tp to 
Fp and other semantic operators. At the same time, we briefly discuss further 
alternative semantics as treated throughout the book and discuss conclusions 
which can be drawn concerning neural-symbolic integration in general. 

Our conceptual starting point is Theorem 7.5.3, which tells us that approx- 
imating networks exist if and only if the single-step operator is continuous in 
the Cantor topology.*? We can now use this result to leverage several new 
results on the relationship between the supported model semantics and other 
semantics in order to derive similar results for these other semantics. 

In Section 5.4, we considered a very general family of semantic operators 
and also examined the question of how one may characterize Cantor continuity 
for them. The following result is thus an easy corollary of Theorem 5.4.7. 


7.7.1 Theorem Let P be a program with a locally finite local consequence 
operator T. Then T can be uniformly approximated by 3-layer feedforward 
networks in the sense of Theorem 7.2.2. 


42We briefly remark that a result established in [Hornik et al., 1989], which states that 
every measurable function can be approximated almost everywhere by a 3-layer sigmoidal 
feedforward network, is not necessarily useful for our purposes. This is so despite the fact 
that it was shown in Theorem 5.5.1 that many semantic operators, including the single-step 
operator, are always measurable, and hence also the Gelfond—Lifschitz operator, see Theo- 
rem 6.2.4. However, it should be noted that the Cantor set is a set of (Lebesgue) measure zero 
when viewed as a subspace of the reals. Thus, the result just quoted of [Hornik et al., 1989] 
need not necessarily lead to useful approximation results for these operators. Indeed, such 
approximations arising from the result of [Hornik et al., 1989] may fail to approximate the 
operator in question at every point. Nevertheless, it remains to be investigated whether non- 
zero measures exist on the Cantor set, which yield useful approximations in conjunction with 
the results of [Hornik et al., 1989]. 
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Theorem 7.7.1 covers, among others, all Fitting-style operators from Sec- 
tion 5.2.1. 

The fixpoint completion, studied in Section 6.1, also turns out to be very 
useful, since it allows us to reduce treatments of the Gelfond—Lifschitz oper- 
ator to the single-step operator. According to Theorem 7.2.2, we are first of 
all interested in carrying over continuity results with respect to the Cantor 
topology. From Theorem 6.2.2 we thus obtain the following result. 


7.7.2 Theorem Let P be a normal logic program, and let the following con- 
dition be satisfied for all J € Ip and A € Bp: whenever GLp(I)(A) = f, 
then either there is no clause with head A in ground(P) or there exists a 
finite set S(T, A) = {Ai,..., Ax} C Bp such that I(A;) = t for all i, and 
for every clause A — body in ground(P) at least one ~A; or some B with 
GLp(J)(B) = f occurs in body. Then GLp can be uniformly approximated 
by 3-layer feedforward networks in the sense of Theorem 7.2.2. 


We also obtain the following corollary, taking Corollary 6.2.3 into account. 


7.7.3 Corollary Let P be a covered normal logic program. Then GLp can 
be uniformly approximated by 3-layer feedforward networks in the sense of 
Theorem 7.2.2. 


Likewise, the remark given in Footnote 3 on page 170 of Chapter 6 together 
with Lemma 5.4.12 allows us to derive similar characterizations of continuity 
for the operator characterizing three-valued stable models. 

In principle, one can use the results recorded earlier to embark on inves- 
tigations similar to those undertaken in Sections 7.5.3 to 7.5.5, for example. 
However, a direct application of the results for the Gelfond—Lifschitz oper- 
ator is hardly satisfactory since the computation of the fixpoint completion 
can only be carried out in an approximate manner. How one deals with this 
problem, and what it entails, remains to be investigated. 

It should be clear from Section 7.6 how one carries over the approach 
using a finite subset of the grounding of a program to other locally finite 
local consequence operators. For operators like the Gelfond—Lifschitz operator, 
however, a straightforward approach is rather unsatisfactory due to the fact 
that one iteration of the Gelfond—Lifschitz operator involves the taking of 
a limit of the single-step operator for definite programs. As an alternative 
approach, we could again first compute the fixpoint completion of the program 
and employ Theorem 6.1.4 in conjunction with the methods from Section 7.6, 
but alas, we have noted already that computation of the fixpoint completion 
can only be done in an approximate manner. How to deal with this problem in 
an appropriate manner again is something which remains to be investigated. 


Before closing this chapter, we would like to remark that there is a plethora 
of work which has been done on the integration of logic and neural networks.*% 


43See, for example, [Bader and Hitzler, 2005, Hammer and Hitzler, 2007] for overviews. 
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In particular, the propositional core method (see Section 7.4) has spawned 
a lot of investigations, including extended semantics for propositional logic 
programs.‘* But alternative methods are also under investigation which do 
not connect directly with the investigations into the mathematical foundations 
of logic programming we have presented in this book.*° 


44Most notable is the body of work done by Artur d’Avila Garcez of City Univer- 
sity London on, for example, modal logic, see [d’Avila Garcez et al., 2007], intuitionistic 
logic, see [d’Avila Garcez et al., 2006], and epistemic and temporal logic, see, for example, 
[d’Avila Garcez and Lamb, 2006]. See also [d’Avila Garcez et al., 2009]. 

45 For further notable recent work based on methods other than those reported here, the 
reader should consult the following papers [Gust et al., 2007, Hdlldobler and Ramli, 2009, 
Komendantskaya, 2010, Buillame-Bert et al., 2010]. 


Chapter 8 


Final Thoughts 


In this book, we have provided a comprehensive treatment of logic program- 
ming semantics from the perspective of fixed-point semantics. In doing so, we 
have covered a lot of material which also relates to other areas of interest 
outside the realm of logic programming as such. In this final chapter, we dis- 
cuss contributions to and relationships between the content of this book and a 
rather diverse mix of topics, ranging from foundations of computing via arti- 
ficial intelligence to cognitive science. We do so with the usual understanding 
that the impact of foundational research is more often than not indirect in 
nature in providing results, methods, and insights, which can be carried for- 
ward by research communities at large until a critical mass is reached, thereby 
enabling significant or even major advances to take place. 


8.1 Foundations of Programming Semantics 


The classical semantic analysis of programs in the sense of denotational 
semantics is based on monotonic, order-continuous operators, via their least 
fixed points using Theorem 1.1.9 or Theorem 1.1.10. This approach, however, 
fails for paradigms where the semantics is expressed by fixed points of op- 
erators which are not monotonic in general. In particular, it fails for logic 
programming in several of its variants, as studied throughout this book. 

By developing methods for the fixed-point semantic analysis of programs 
with non-monotonic semantic operators, we therefore widen the scope of ap- 
plicability of fixed-point semantics. In particular, we provide sufficiency con- 
ditions for the existence of fixed points (Chapter 4) and show how they can 
be applied to various semantics based on non-monotonic operators (Sections 
5.1 and 5.4 and Chapter 6). 


It seems evident that these methods should carry over to other such 
paradigms. However, a limitation of some of the work presented in this book 
is that certain of the fixed-point theorems provided in Chapter 4 always guar- 
antee the existence of a unique fixed point, if there is a fixed point at all, 
thus rendering the theorems in question of limited applicability to paradigms 
(or programs) where multiple fixed points are the norm. The latter situation 


221 


222 Mathematical Aspects of Logic Programming Semantics 


is encountered in the logic programming paradigm in the case of the stable 
model semantics, for example, and indeed our analysis in Section 6.2 is lim- 
ited in this respect. Multiple fixed points also arise naturally in the context of 
disjunctive logic programs, that is, logic programs where additionally disjunc- 
tions of atoms are allowed in rule heads as discussed briefly in Section 4.9. The 
application of fixed-point theorems for multivalued mappings as provided, for 
example, in Sections 4.9 to 4.13 may provide a remedy when this line of work 
has been fully worked out. In particular, an approach to this problem based on 
the Rutten-Smyth theorem and a careful analysis and choice of quasimetrics 
(perhaps based on level mappings) holds out considerable prospects in this 
respect, see [Seda, 1997]. In addition, approaches such as the one mentioned 
in Section 4.14 also overcome this problem to a considerable extent.! 


8.2 Quantitative Domain Theory 


Domain theory,? based on order continuity of semantic operators, is the 
dominant theory underlying the denotational semantics of programming lan- 
guages. However, an alternative tradition in the semantics of programming 
languages to that using domains is an approach based on the use of metric 
spaces, as already mentioned in Chapter 4.° A reconciliation of these two ap- 
proaches is of obvious interest for the theory of programming semantics, and a 
considerable body of work has been done on this very topic,* resulting in the 
area of quantitative domain theory. Indeed, the Rutten-Smyth theorem arose 
out of precisely these considerations. 

In contrast to mainstream work on this reconciliation, which is driven by 
a mainly conceptual motivation to unite two theories, the work presented 
in this book is driven by a clear application, namely, the semantic analysis 
of logic programming. In pursuing this application, we have developed sev- 
eral results which provide conceptual insights into the relationships between 
domain-theoretic semantics and metric semantics. A key role is played by the 
Scott and Cantor topologies (Chapter 3) as the underlying spaces. Another 
key role is played by the relationship between ordered spaces and (generalized) 
metric spaces (Section 4.8) and by the various fixed-point theorems which can 
be provided for these spaces (Chapter 4), some of which have been taken 
directly from work on quantitative domain theory. 

A theme which has not been taken up in this book in detail and which pro- 
vides scope for further work is to investigate more closely how the application- 


1See also [Hitzler and Seda, 1999c, Straccia et al., 2009] for some more investigations 
into these matters. 

See [Scott, 1982al. 

3See [de Bakker, 2002]. 

4Tnitiated by work such as [Smyth, 1987]. 
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driven work of quantitative domain theory, as discussed herein, relates to 
theory-driven advances in the same topic, which were developed in parallel.® 


8.3 Fixed-Point Theorems for Generalized Metric Spaces 


Fixed-point theorems have a rightful place in the core arsenal of mathemat- 
ical tools applicable to theoretical computer science, with many applications 
outside this realm of course. The Banach contraction mapping theorem, The- 
orem 4.2.3, which is the starting point for many of the investigations in this 
book, is one of the most fundamental of these theorems. 

In this book, we have contributed to the study of fixed-point theory by ex- 
ploring generalized metrics and thereby providing a compilation of extensions 
of the Banach theorem, together with results concerning their relationships 
with order-theoretic fixed-point theorems (Chapter 4). We furthermore pro- 
vide evidence of the usefulness of these theorems by applying them, throughout 
the book, to the study of the semantics of logic programs. 

Another theme which has not been taken up here, and again is scope for 
further work, is a systematic investigation of the extent to which the Banach 
theorem, and its relatives, remains valid with respect to generalized distance 
functions under weaker and weaker conditions. Specifically, how weak can 
the ambient spaces be and still support a reasonable version of the Banach 
theorem? 


8.4 The Foundations of Knowledge Representation and 
Reasoning 


Knowledge Representation and Reasoning (KR) is one of the classical 
branches of Artificial Intelligence. Currently, it is experiencing massive re- 
newed interest due to the advent of the Semantic Web.® In a nutshell, the Se- 
mantic Web strives to improve the World Wide Web by making Web content 
machine-understandable, and it does so by using KR methods, more precisely, 
by endowing Web content with additional meta-content in the form of knowl- 
edge bases (so-called ontologies), which describe the content in a logic-based 
format. 

Several KR languages have been developed and standardized by the World 


5The papers [Waszkiewicz, 2002, Waszkiewicz, 2003, Waszkiewicz, 2006, Krétzsch, 2006, 
Kiinzi et al., 2006, Kiinzi and Kivuvu, 2008], for example, may be consulted. 
®See [Hitzler et al., 2009b] for an introductory textbook. 
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Wide Web Consortium’ for this purpose. One of them, called RIF,® is essen- 
tially a logic programming language, and other ontology languages can also 
be understood as logic programming variants.’ 

In the light of such recent developments, theoretical investigations into 
logic programming, as provided in this book, gain further interest. It can be 
conjectured that the methods developed in this book may be used for design 
and analysis of new KR languages suitable for application purposes. 

Conceptually interesting from this point of view is the observation that 
the methods of analysis provided herein are close to a denotational seman- 
tics approach and thus complement the historically model-theory-driven se- 
mantics in KR languages. In particular, there may be scope for the study 
of decidability and/or semi-decidability of KR languages based on the level- 
mappings approach discussed in Chapter 2,'° a topic which has so far been 
largely neglected for logic programming, although it has played a major role in 
the development of the currently main ontology language, the Web Ontology 
Language OWL.!! 


8.5 Clarifying Logic Programming Semantics 


In this book, we have covered the most important semantics for normal 
logic programs. However, many more different semantics for normal logic pro- 
grams and generalizations of this paradigm have been defined in the literature. 
The rationale behind these various semantics has been manifold, depending 
on one’s point of view, which may be that of a programmer or inspired by 
commonsense reasoning. Consequently, the constructions which lead to these 
semantics are technically very diverse, and the exact relationships between 
them have not yet been fully understood. 

Our work, and in particular the treatment in Chapter 2, but also Sec- 
tion 5.2, provides a uniform perspective of different logic programming seman- 
tics, and it should be clear from the proofs that the approach adopted there 
can be lifted to other fixed-point semantics, in particular to those involving 
monotonic operators. It thus reconciles these semantics within an overarching 
framework which can be used for easy comparison of semantics with respect 
to syntactic structures that can be employed with them, that is, to deter- 
mine the extent to which a semantics is able to break up positive or negative 
dependencies or loops between atoms in programs. 


TW3C, http://www.w3.org/ 

8See [Boley and Kifer, 2010, Hitzler et al., 2009b]. 

%OWL RL [Hitzler et al., 2009a, Reynolds, 2010], ELP [Krétzsch et al., 2008], or F-Logic 
[Kifer et al., 1995], for example. 

10For a preliminary investigation into this, see [Cherchago et al., 2007]. 

11See [Hitzler et al., 2009a, Hitzler et al., 2009b]. 
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It still remains to be seen, however, how far this approach can be carried,!” 
and whether or not it is possible to establish a meta-theory which goes beyond 
mere characterization.’ 


8.6 Symbolic and Subsymbolic Representations 


How to overcome the gap between symbolic and subsymbolic represen- 
tations, and how to integrate them in an efficient and effective manner, is 
a topic of growing interdisciplinary importance. It is driven by advances in 
neuroimaging, which call for the modelling of findings in neuroscience on a 
higher and higher level of abstraction, and by the search in Cognitive Science 
for suitable cognitive architectures to model complex behaviour. By symbolic 
we mean, of course, knowledge representation formalisms based on logic or 
similar algebraic structures, while the term subsymbolic refers to paradigms 
such as artificial neural networks, where knowledge is not represented in a 
crisp, declarative way. 

The topology-driven view of logic programming semantics which we pursue 
herein indirectly embraces this theme by providing a conceptual bridge be- 
tween the discrete (symbolic) world of logic and the continuous (subsymbolic) 
world of topology and analysis on the reals. 

While, originally, we developed this point of view purely for the purposes of 
analyzing logic programs and in order to advance quantitative domain theory, 
it bears, at least conceptually, on the symbolic/subsymbolic issue. However, we 
have not pursued this in any structured manner, apart from developing neural- 
symbolic integration (Chapter 7), albeit with a different initial motivation 
(see Section 8.7). The question remains open to what extent our insights can 
contribute to the larger quest. 


8.7 Neural-Symbolic Integration 


Our work on neural-symbolic integration started as a straightforward ap- 
plication of our topological approach to logic programming semantics. The 
pursuit (Chapter 7) was then driven mainly by an engineering motivation (as 


12Disjunctive well-founded semantics were compared using this approach in the paper 
[Knorr and Hitzler, 2007], but only with limited success since the characterizations became 
rather complicated. 

13Tn [Cherchago et al., 2007], for example, level mappings were used to study decidability 
properties. 
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opposed to a cognitive science motivation as discussed in Section 8.6), that 
is, by the idea of combining logic programming and artificial neural networks 
in such a way that the best of both worlds — declarativeness, trainability, 
robustness, and reasoning capabilities — is retained. 


Indeed, this effort has paid off, and while we provide only the theoretical 
underpinnings in Chapter 7, we are indeed able to show that a declarative, 
trainable, robust, and reasonable system can be developed on these grounds, !4 
although it has to be said that the advance remains conceptual in nature 
because the system is severely limited in terms of the size of the knowledge 
base involved. Nevertheless, it is to date one of the two reported systems with 
these capabilities. !° 


Significant further advances on this front, in particular with respect to 
the integration of learning and reasoning, would be highly appreciated in 
practice.'6 


8.8 Topology, Programming, and Artificial Intelligence 


It has been argued that there is a strong relationship between topologi- 
cal dynamics (chaos theory), logic programming, neural networks, and other 
paradigms, and in particular this is so in the context of emergent behaviour 
as represented by cellular automata, say.'’ Indeed, from a bird’s eye perspec- 
tive each seems to be capable of being mapped onto the others. At the same 
time, the study in any one of these paradigms seems to pose the same sort of 
obstacles found in the others, particularly is this so in relation to the handling 
of chaotic dynamics and emergence. 


Some of the work in this book contributes to this discussion, especially with 
respect to topological dynamics, logic programming, and neural networks, as 
discussed in Section 7.5.18 Obviously, this is only a small stepping stone in 
the pursuit of these issues which, once fully understood, will provide a major 


14See [Bader et al., 2007, Bader et al., 2008, Bader, 2009] for details. 

15The approach in [Gust et al., 2007] achieves similar results with entirely different meth- 
ods. 

16For a discussion of the Semantic Web (see Section 8.4) as a potential test case for neural- 
symbolic integration, see [Hitzler et al., 2005]. For a general discussion of the need for the 
integration of learning and reasoning for Semantic Web applications, see [Hitzler, 2009, 
Hitzler and van Harmelen, 2010]. 

17See, for example, [Blair et al., 1997a, Blair et al., 1999] 

18In [Bader and Hitzler, 2004], it was shown that there is indeed a tight relationship 
between logic programs and fractals in the sense in which they arise as attractors of iterated 
function systems. 
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advance in our overall understanding of complex phenomena. However, we may 
not yet have the mathematical tools available to really understand them.!® 


19This last sentence is a citation from a keynote talk given by Howard A. Blair, of Syracuse 
University, at the MFCSIT2000 conference in Cork, Ireland. 


Appendix 


Transfinite Induction and General 
Topology 


In order to help make our discussions relatively self-contained, it will be con- 
venient to collect together in this Appendix the basic facts and notation we 
need from the theory of ordinals! and from the subject of general topology. 


A.1 The Principle of Transfinite Induction 


We begin with a brief discussion of the theory of ordinals and transfinite 
induction. In particular, we give a statement of the principle of transfinite 
induction in the form in which we make use of it on a number of occasions. 


A.1.1 Definition A partially ordered set X is well-ordered or is a well- 
ordering if each non-empty subset of X has a first or least element. 


A.1.2 Example (1) The set N of natural numbers is well-ordered in the usual 
ordering < on N. 


(2) The set Z of integers is not well-ordered in the usual ordering < on Z. 


A.1.3 Lemma The following statements hold. 
(a) Every well-ordered set is linearly ordered. 
(b) No well-ordered set contains an infinite strictly descending sequence. 


Proof: (a) Let (X,<x) be a well-ordered set, and let x,y E€ X. Then the set 
{x,y} is a non-empty subset of X and hence has a least element, x, say. But 
then x <x y, which establishes (a). 

For (b), suppose that (x7) nen is an infinite strictly decreasing sequence in 
the well-ordered set (X, <x). Then {ap | n € N} itself is a non-empty subset 


1Our treatment of these matters is informal and non-axiomatic and is in the spirit of the 
book [Halmos, 1998] to which we refer the reader for further details. 
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of X which has no least element, contradicting the hypothesis that (X, <x) 
is well-ordered. |_| 


Given two well-ordered sets (X,<x) and (Y,<y), we call f : X — Y 
monotonic if a <x b implies f(a) <y f(b) for all a,b € X. If f is also 
injective, then f is called an embedding of X into Y. If f is both monotonic 
and bijective, then f is called an order isomorphism between X and Y, and 
in this case the two well-orderings X and Y are called isomorphic. Note that 
all these definitions are consistent with the definitions concerning orderings 
made in Chapter 1. 


A.1.4 Definition Suppose that (X,<x) is a well-ordered set and that £o € 
X. We call the set J = I(x) = {x € X | £ <x xo} the initial segment of X 
determined by xo. We call an initial segment I of X a proper initial segment 
if I is a proper subset of X. 


A.1.5 Definition Suppose that (X,<x) and (Y,<y) are two well-ordered 
sets. Then we write X < Y if X is isomorphic to an initial segment of Y. We 
write X < Y if X is isomorphic to a proper initial segment of Y. 


A.1.6 Theorem For any two well-ordered sets X and Y, exactly one of the 
following statements holds. 


(a) X<Y. 
b) X >Y. 
(c) X and Y are isomorphic. 


Proof: We first prove the following statement. 
(1) No well-ordered set (Z,<z) is isomorphic to a proper initial segment of 
itself. 

To see this, suppose f : I — Z is an isomorphism, where I is a proper 
initial segment of Z. Then we cannot have f(x) = <x for all x € I; otherwise, f 
would not be surjective. Let xo be the least element of the set of those elements 
x of I such that f(x) 4 x. Noting in particular that f(ao) Æ xo, we see that 
we cannot have f(xo) <z Xo otherwise f(f(xo)) = f(x%o) by minimality of 
xo, and this yields the contradiction that f is not injective. Hence it must 
be the case that zo <z f(xo). Now let xı € I be such that f(x) = zo. 
Then x; Æ xo because f(xo) Æ zo. If a1 <z zo, then by definition of xo 
again, we obtain zo = f(x) = zı <z zo, which is impossible. If £o <z z1, 
then f(x1) = £o <z f(%o), which contradicts the monotonicity of f. Hence, 
Statement (1) holds. 

We will also need the following statement. 

(2) Suppose the well-ordered sets (W, <w) and (Z, <z) are isomorphic. Then 
there is a unique isomorphism f : (W, <w) > (Z, <z). 
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In order to see this, suppose f,g : W — Z are isomorphisms. We show 
f = g. Assume this is not the case, and let wo € W be the <w-least w 
such that f(w) 4 g(w); suppose in fact that f(wo) <z g(wo) for the sake 
of argument. Let wi E€ W be such that g(wi) = f(wo). Then wi 4 wo. 
If wi <w wo, then by minimality of wo and monotonicity of f, we obtain 
g(wi) = f(wi) <w f(wo) = g(w1), which is impossible. If wo <w wi, then 
by monotonicity of g, we have f(wo) <z g(wo) <z g(wi) = f(wo), which is 
also impossible. So Statement (2) holds. 

We now turn to the proof of the theorem. 

Define the relation R from X to Y by R(a,y) if and only if the initial 
segments {w € X | w <x x} and {v € Y |v <y y} are isomorphic, where 
x € X and y € Y. First note that R(x, yı) and R(x, y2) implies yı = y2 by 
Statement (1). So R is a partial function. By symmetry, transitivity, and (1) 
again, R is also injective. 

We next show that dom(R) is an initial segment of X. Suppose x2 € 
dom(R), say, R(x2, y2), and let x1 <x z2. Let f be the isomorphism between 
the initial segments corresponding to x2 and yo. Then the initial segments 
corresponding to xı and f (x1) are also isomorphic, so R(x1, f(a1)), and hence 
zı E€ dom(R). We have also shown that R is order-preserving. 

A similar argument shows that the range of R is an initial segment of Y. 
Hence, R is an isomorphism from an initial segment I(xo), say, of X to an 
initial segment J(yo), say, of Y; thus, R(zo, yo) holds. 

Now consider the following cases. If I(xo) = X, but J(yo) # Y, then case 
(a) holds. If I(xo) Æ X, but J(yo) = Y, then case (b) holds. If I(xo) = X 
and J(yo) = Y, then case (c) holds. Suppose finally that I(xo) # X and 
J(yo) # Y. Let xı be the first element of X \I(ao) and yı be the first element 
of Y \ J(yo); then x, is not in the domain of R (and yı is not in the range 
of R). But clearly I(xo) U {x1} is the initial segment I(x ,) and J(yo) U {y1} 
is the initial segment J(y1), and, furthermore, I(xı) and J(yı) are clearly 
isomorphic by an isomorphism which, by (2), must be an extension of R. We 
therefore obtain the contradiction that xı is in the domain of R. 

Hence, only one of (a), (b), (c) holds, as required. a 


Next, we state without proof a well-known theorem usually attributed to 
E. Zermelo. This theorem has the consequence that any set is a carrier set for 
some ordinal, see [Halmos, 1998] for details. 


A.1.7 Theorem (The Well-Ordering Theorem) Every set can be well- 
ordered. 


A.1.8 Definition An ordinal or ordinal number is an equivalence class of a 
well-ordering under the equivalence relation of isomorphism. 


The ordinals themselves can be ordered as follows. First, for any well- 
ordered set A, let #A denote the equivalence class of A under the equivalence 
relation of isomorphism. Suppose that a = #A and 8 = #B are ordinals. We 
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define the ordering < on the ordinals by a < 2 if and only if A < B, and we 
note that < is easily seen to be well-defined on the ordinals. Furthermore, by 
Theorem A.1.6, the ordering < is a partial order and, as we show next, is in 
fact a well-order. 


A.1.9 Lemma Let X be a linearly ordered partially ordered set which is not 
well-ordered. Then X contains an infinite strictly descending sequence. 


Proof: If X is not well-ordered, then there exists a subset Xo of X which 
does not contain a least element. Choose some zo € Xo, and note that X; = 
{y E X | y < xo} does not contain a least element. Now assume that some 
zi E€ X has been chosen such that the set Xi41 = {y E€ X | y < zı} does 
not contain a least element. Then we can choose x;41 E€ X;41 arbitrarily and 
obtain x41 < z; and also that Xi+2 = {y E€ X | y < vi41} does not contain 
a least element. By the inductive argument just given, we obtain an infinite 
strictly descending sequence (£n), as required. E 


A.1.10 Proposition Every set of ordinals is itself well-ordered by <. 


Proof: We begin by noting that if a and 8 are ordinals such that a < @ and 
a = #A and 8 = #B, then we can assume without loss of generality that 
A C B; we will make use of this observation in what follows. 

Let X be a set of ordinals which is not well-ordered. Then, by Lemma 
A.1.9, X contains an infinite descending sequence agp > a, > ag >... of 
ordinals. For each i € N, suppose that a; = #A; and that A; D Aji1. Then 
for each i € N there exists a; € A; \ Ai+1. Hence, {a; | i € N} C Ap is a subset 
of Ag without a least element, which is impossible. E 


It is common practice to identify any ordinal œ with the set of all ordinals 
GB such that 8 < a; so, in these terms, 8 < a if and only if 8 € a. We 
will follow this practice in the following. In particular, when we speak of a 
mapping f : X — a, where a is an ordinal, we mean, in fact, a mapping 
f: X > {B| 8 <a}. 

Ordinals fall into two classes. A successor ordinal is an ordinal a such that 
there is a greatest ordinal 8 with 8 < a. In this case, a is called the successor 
of 8 and may be denoted by 8 + 1; we also call 8 the predecessor of a and 
may denote it by a—1. Any ordinal which is not a successor ordinal is called 
a limit ordinal. 

Any ordinal has a successor. To see this, let œ be an ordinal and identify 
it with the set of ordinals {8 | 8 < a}. Then aU {a} is an ordinal above a 
and indeed is the least ordinal above a and therefore is the successor a + 1 of 
Qa. 

We next give an example containing details of some familiar ordinals. 
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A.1.11 Example It is easy to see that any finite set A = {a1,...,a,}, con- 
taining n elements, can be well-ordered in essentially one way. Thus, if A 
and B are any well-ordered sets containing n elements, then A and B are 
isomorphic. Standard notation for the finite ordinals, together with canonical 
representatives for them, is as follows: 0 = #0, 1 = #{@}, 2 = #{0, {O}}, 
3 = #{0, {0}, {0, {O}}}, etc. Thus, we are using the same symbols 0, 1,2, 3,... 
to denote natural numbers and ordinal numbers (as well as cardinal numbers), 
but the context in which they occur will determine their meaning. Often, we 
consider an ordinal to be the set of all its predecessors, as already noted, in 
which case we view the ordinal n as the set {0,1,...,n — 1} for each n. Fur- 
thermore, 0 is the least ordinal, 1 is the successor of 0, 2 is the successor of 1, 
etc. Thus, we have 0< 1<2<3<--- as ordinals. 

Turning now to ordinals determined by infinite sets, we note first that in- 
finite sets can be well-ordered in more than one way. For example, the set N 
of natural numbers can be well-ordered by writing it as {1,3,5,...;2,4,6,...} 
and ordering it from left to right. The resulting well-order is clearly not iso- 
morphic to N well-ordered by the usual order on N. Indeed, the first infi- 
nite ordinal or least infinite ordinal, denoted by w, is the ordinal determined 
by N in its usual order, that is, w = #N. Thus, w is the first limit ordi- 
nal. The successor of w is w +1 = {0,1,2,...,w}, the successor of which 
isw+2 = (w+1)4+1 = {0,1,2,...,w,w + 1}, etc. The next, or second, 
limit ordinal is denoted by w2 = {0,1,2,...,w,w+1,w+2,...,w+n,...} 
etc. In this way, the ordinals form a transfinite sequence, and indeed any 
non-finite ordinal is sometimes called a transfinite number. Thus, we have 
0<1<2<3<---<w<wtl<wt2<-:-<w2<w24+1<w24+2< 

LWL LUN << ww = ww? < w +1 < w? +2 <--- as ordi- 
nal numbers. Note also that all the ordinals we have so far displayed in this 
example are determined by countable sets. The first uncountable ordinal is 
denoted by w and as a set is the uncountable well-ordered set containing all 
the countable ordinals. | 


We are now in a position to consider the principle of transfinite induction. 
The reader may note that it is an extension, from N to arbitrary well-ordered 
sets, of the well-known strong form? of the principle of mathematical induc- 
tion. 


A.1.12 Theorem (Principle of Transfinite Induction) Suppose that A 
is any well-ordered set and B is a subset of A which satisfies the statement 
that a € B whenever x € B for all x < a. Then B= A. 


Proof: If B 4 A, then A\ B 4 Q. By well-ordering of A and therefore of any 
subset B of A, A\ B has a first element xo, say. But now we have x € B for 


2 Also known as course of values induction. 
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all x < xo, and the induction hypothesis leads to the conclusion that 29 must 
belong to B. This contradiction shows that B = A, as required. a 


A.1.13 Corollary Suppose that A is a well-ordered set and {p(a) | a € A} is 
a set of statements indexed by A. Suppose further that for all b € A it follows 
that p(b) is true if p(x) is true for all x < b. Then p(a) is true for all a € A. 


In fact, the form in which we will usually apply the principle of transfinite 
induction is as follows. 


A.1.14 Corollary Suppose that p(a) is a statement depending on the ordi- 
nal a. Suppose further that for all ordinals 3, p(@) is true if p(y) is true for 
all y < 6. Then p(q) is true for all ordinals a. 


When applying the principle of transfinite induction as a proof principle, as 
formulated in Corollary A.1.14, it is usually convenient to split the argument 
into two cases. The first of these is when ĝ is assumed to be a successor ordinal, 
and the second is when ( is assumed to be a limit ordinal. 


A.2 Basic Concepts from General Topology 


We next turn to giving a brief overview of the general topology we need 
at various points in our discussions.? In addition, we include here the proofs 
of the results we stated without proof in our treatment of the Scott topology 
in Chapter 3. 


A.2.1 Definition A topology on a set X is a collection 7 of subsets of X, 
called the open sets of 7, satisfying the following properties. 


(1) Any union of elements of 7 belongs to 7. 
(2) Any finite intersection of elements of 7 belongs to T. 
(3) Ø and X belong to 7. 


The pair (X,7), or simply X by an abuse of notation, is called a topological 
space. 


A.2.2 Definition Given two topologies 7; and T2 on a set X, we say that 7) 
is weaker or coarser than 72, or that Tə is stronger or finer than T1, if T1 C T2. 


3Our background references for the material we need from general topology are the books 
[Kelley, 1975] and [Willard, 1970] to which we refer the reader for proofs of the results we 
simply state. 
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Given a set X, the coarsest topology which can be defined on X is the 
indiscrete topology in which the only open sets are Ø and X. At the other 
extreme, the finest topology which can be defined on X is the discrete topology 
in which all subsets of X are taken as open sets. 


A.2.3 Definition If X is a topological space and x € X, then a neighbour- 
hood of x is a set U containing an open set V containing x, that is, x € V CU, 
where V is open. The neighbourhood system U, of x is the collection of all 
neighbourhoods of x. 


A.2.4 Definition A neighbourhood base at x in the topological space X is 
a subcollection By C Ux such that, for each U € Uz, there exists V € Bz 
satisfying V C U. Thus, Us = {U C X | V C U for some V € Bz}. The 
elements of 6, are called basic neighbourhoods of x. 


A.2.5 Theorem Let X be a topological space, and, for each x € X, let B, 
be a neighbourhood base at x. Then the following properties hold. 


(a) If V € By, then x E V. 
(b) If Vi, Vo € Bz, then there is V3 € B, satisfying V3 C Vi N Va. 


(c) If V € Bz, there is some Vo € Ba such that if y € Vo, then there is W € By 
satisfying W CV. 


(d) GC X is open if and only if G contains a basic neighbourhood of each of 
its points. 


Conversely, suppose that X is a set and that a collection B, of subsets of X, 
called basic neighbourhoods of x, is assigned to each element x € X in such a 
way that (a), (b), and (c) above are satisfied. If we then define a set G to be 
open if and only if it contains a basic neighbourhood of each of its points, as 
in (d), we obtain a topology on X in which B, is a neighbourhood base at x 
for each x € X. 


A.2.6 Definition In a topological space (X,7), a base for 7 (or a base for X 
by an abuse of terminology) is a collection B C r of subsets of X such that 
each element of 7 is a union of elements of B. Equivalently, B is a base for r if 
and only if whenever V € 7 and x €E V, there is U € B such that xe U CV. 
Furthermore, a collection C C 7 is called a subbase for T (or a subbase for X) 
if the collection of all finite intersections of elements of C forms a base for T. 


A.2.7 Theorem A collection B of subsets of a set X is a base for a topology 
on X if and only if the following conditions are satisfied. 


(a) UsenB = X. 
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(b) Whenever Bı, B2 E€ B and x € Bı N Bo, there is B3 € B satisfying 
LE B3 = Bı f Bə. 


Furthermore, any collection C of subsets of X is a subbase for some topology 
on X, namely, the topology formed by taking all arbitrary unions of finite 
intersections of elements of C. 


A.2.8 Theorem Suppose that $ is a collection of open sets in a topological 
space X. Then B is a base for X if and only if, for each x € X, the collection 
B, = {B € B |x € B} is a neighbourhood base at z. 


As noted in Definition A.2.1, the elements of 7 are called the open sets in 
the given topology on X. By definition, we call a subset F of X closed if its 
complement, X \ F, is open. It follows immediately that Ø and X are closed 
sets, that any finite union of closed sets is itself closed, and that an arbitrary 
intersection of closed sets is closed. Therefore, given an arbitrary subset E of 
X, the intersection E of all the closed sets containing E is a closed set, the 
smallest closed set containing E, and is called the closure of E. Clearly, a set 
F is closed if and only if F = F. Dually, one defines the interior U? of a 
subset U of X to be the largest open set contained in U, and it is of course 
the union of all the open sets contained in U. Moreover, it is also clear that a 
set O is open if and only if O = O°. 

A closure operator (also known as a Kuratowski, or topological, closure 
operator) on a set X is a mapping ©: P(X) — P(X), from the power set 
P(X) of X into itself, subject to the following axioms. 


(1) O° =f. 

(2) AC A for all AC X. 

(3) (AU B) = A°U B® for all A,B C X. 
(4) A° = (A°)® for all AC X. 


Just as the notion of an open set can be taken as basic in defining topolo- 
gies, so clearly can the notion of a closed set. More interesting is the fact that 
closure can be taken as fundamental, and indeed the characteristic properties 
of closure are precisely the four just stated in defining a closure operator, in 
the following sense. 


A.2.9 Theorem Let X be a non-empty set, and let ©: P(X) —> P(X) bea 
closure operator on X. Then T = {X \ A| A C X,A = A’‘} is a topology on 
X, called the topology associated with ©, in which we have A = A® for each 
subset A of X. Thus, A° is the topological closure in X of each subset A of 
X with respect to the topology T associated with °. 
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A.3 Convergence 


It is well known, see [Willard, 1970, Chapter 4], that sequences are not 
adequate to describe all basic notions in topological spaces other than in the 
class of first countable spaces (a topological space is called first countable if 
it has a countable neighbourhood base at each of its points). One therefore 
needs notions more general than that of sequence. Such generalizations are 
provided by nets and filters, either of which is adequate to describe all topo- 
logical concepts. Indeed, convergence itself can be taken as the fundamental 
concept in developing topology, see Theorem 3.1.3, and this is the point of 
view adopted in Chapter 3. However, we choose to work here only with nets 
for reasons already mentioned in Chapter 3. 


A.3.1 Definition A net in a set X is a mapping s : J — X, where (Z, <) or 
simply Z is a directed set in which the ordering < is reflexive and transitive. 
For each i € T, we denote s(i) by s; and denote the net s : T — X by (s;)iez or 
simply by (s;) or just by s; if no confusion results. Similarly, sequences (Sn )nen, 
being special cases of nets, may be denoted simply by (sn) or Sn. Given a net 
(s;)sex in X and an element ip of Z, we call the set (s;)i9<; = {s: | io < i} 
a tail of (si)icz. A property will be said to hold eventually with respect to a 
net (s;);ez if it holds for some tail of the net. 


A.3.2 Definition A subnet t of a net s:Z— X isanett: J — X satisfying 
(i) t = soy, where y is a function mapping J into Z, and (ii) for each ig € T, 
there exists jo € J such that y(j) > io whenever j > jo. The point so y(j) 
is often denoted by s;,, and we refer to the subnet (5;,)je7 of (S:)iez- 


A.3.3 Definition Let X be a topological space, and let x € X. A net (s;)jer 
in X will be said to converge to x, written s; — x or lim; s; = x, if, for each 
neighbourhood U of x, there exists i9 € Z such that s; € U whenever io < i. 
If s; — x, then we call x a limit of si. 


Since the singleton set {x} is a neighbourhood of x if X is endowed with 
the discrete topology, it follows that s; — x in the discrete topology if and 
only if (s;) is eventually constant. 

The notion of continuous function between topological spaces is fundamen- 
tal in the subject. There are several ways of formulating this concept, but the 
following is perhaps the most intuitive. 


A.3.4 Definition Let X and Y be topological spaces, and suppose that f: 
X — Y is a function. Then f is said to be continuous at x € X if, for each 
neighbourhood V of f(x) in Y, there is a neighbourhood U of x in X such 
that f(U) C V. We say f is continuous if it is continuous at x for each z € X. 
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The sense, mentioned earlier, in which nets can describe all basic topolog- 
ical notions can now be clarified. 


A.3.5 Theorem Let X and Y be topological spaces. Then the following 
statements hold. 


(a) Let E C X. Then x € F if and only if there is a net (s;) in E such that 


Si > T. 


(b) A subset O of X is open if and only if, whenever x € O and (s;) is a net 
such that s; — x, we have that (s;) is eventually in O. 


(c) A subset F of X is closed if and only if, whenever (s;) is a net in F and 
si > x, we have x E€ F. 


(d) A function f : X — Y is continuous at x € X if and only if, whenever 
si > x in X, we have f(s;) — f(x) in Y. 


Proof: We include a proof of (b) here since we have specific need of the result. 
Suppose that O is open, that x € O, and that s; — x. Then it is clear from 
the definition of net convergence that (s;) is eventually in O. 

Conversely, assuming the stated condition, we show that O contains a 
neighbourhood of each of its points and hence is open. Let x € O, and let 
U, be the neighbourhood system of x. Let T = {(y,U) | y E€ U € Ur} or- 
dered by (y1, U1) < (y2, U2) if and only if U2 C U;. Then it is easy to see 
that the ordering < directs Z and also that the net s : ZT — X defined by 
s(y,U) = y converges to x. By our current hypothesis, this net is eventually 
in O. Let (yo, Uo) be such that syu) = y E€ O whenever (yo, Uo) < (y,U). 
Since (yo, Uo) < (y,Uo) for all y € Uo, we conclude that x € Up C O, as 
required. a 


A.4 Separation Properties and Compactness 


It is important to have sufficiently many open sets to be able to distinguish, 
in some way, between points in a topological space by means of the open sets. 
This is usually done by means of the following axioms. 


A.4.1 Definition Let X be a topological space. 


(1) We call X a To-space if, whenever x and y are distinct points of X, there 
is an open set containing one but not the other. 


(2) We call X a T,-space if, whenever x and y are distinct points of X, there 
is a neighbourhood of each not containing the other. 
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(3) We call X a T2-space or a Hausdorff space if, whenever x and y are 
distinct points of X, there are disjoint neighbourhoods of x and y. 


One of the important properties of Hausdorff spaces is that stated in the 
following result. 


A.4.2 Theorem A topological space is Hausdorff if and only if every con- 
vergent net in X has a unique limit. 


On the other hand, it is important that there not be too many open sets 
in a certain sense. 


A.4.3 Definition Let X be a topological space. Then an open cover {U; | 
i € T} of X is a collection of open sets U; such that Uier i = X. A subcover 
of an open cover {U; | i € T} is a cover {V; | j € J}, where J C T. We call a 
topological space X compact if every open cover of X has a finite subcover. 


A.5 Subspaces and Products 


There are several ways in which one can create new topological spaces 
from given ones. We discuss here just two of these, namely, the process of 
forming subspaces of topological spaces and the process of forming products 
of families of topological spaces. 


A.5.1 Definition Let (X,7) be a topological space, and let S C X bea 
subset of X. Then the collection ts = {SMO | O € T} gives a topology on S, 
called the relative topology or the subspace topology for S. The space (S73) 
is called a subspace of (X,7) or just a subspace of X. 


Whenever one has a topological space X and a subset S of X, it will be 
assumed that S has been endowed with the subspace topology of X unless 
stated to the contrary. Notice that the sets SMO, where O is open in X, need 
not be open in X unless S itself is an open set of X. 

Now suppose that X; is a topological space for each 2, where i is an element 
of some index set Z. As usual, we denote the product of the family {X; |i € T} 
of sets by [J,e7Xi = {f : T > User Xi | f(a) € Xi}. Associated with any such 
product are the mappings 7j, j € Z, where 7; : [],-7Xi — X; is defined by 
1; (f) = f(y). Indeed, m; is termed the projection on the j-th factor. 

There is a natural topology one can define on [[,-7X; determined by the 
projections as follows. Choose any finite set {71,...,%,} of elements of Z, and 
choose corresponding open sets U; in X;,, for 7 = 1,...,n. Then we take 
the collection of sets of the form n; (Ui Jian m; (Ui ) as a base for 
a topology on [];ezX: called the product topology or the Tychonoff product 
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topology. Indeed, the sets 7, '(U;) form a subbase for this topology, where 
i € T and U; is an open set in X;. It is immediate that each of the projections 
Ti is continuous relative to the product topology and the given topology on 
the factor X;. 

Subspaces of X and products |],-7X; inherit certain properties enjoyed 
by X and the X;, respectively, as one would expect. We summarize next the 
ones relevant to our needs in the following theorem. 


A.5.2 Theorem The following statements hold. 
(a) Subspaces of To or Hausdorff spaces are To or Hausdorff, respectively. 


(b) If X is compact and S' is a closed subset of X, then S' is compact (as a 
topological space in its own right). If X is Hausdorff and S$ is compact, 
then S' is a closed subset of X. 


(c) A non-empty product [[,-7X; is To or Hausdorff if and only if each factor 
space X; is To or Hausdorff, respectively. 


(d) (Tychonoff’s theorem) A non-empty product [],-7Xi is compact if 
and only if each factor space is compact. 


(e) A net (f\) in a product space [],-7Xi converges to f if and only if, for 
each index i € Z, we have 7;(f,) > m:( f) in Xj. 


A.6 The Scott Topology 


We present here the proofs of those results which were simply stated in 
Chapter 3 concerning the Scott topology. In fact, our development constitutes 
a treatment of the Scott topology from the point of view of convergence. 
Unless stated to the contrary, (D, E) will denote throughout some fixed, but 
arbitrary, domain with set De of compact elements. 


A.6.1 Proposition Suppose that A C D is a directed set. Then A is a net in 
D, and, as a net, we have that A — |_| A in the Scott topology. In particular, 
for each s € D, approx(s) — s in the Scott topology. 


Proof: Write A = {a; | i € T} for some index set Z, which we identify with A. 
Then Z is clearly directed by the ordering < obtained by restricting E to A. 
Therefore, the inclusion map Z — D is a net in D. Let A = | A and suppose 
that O is a neighbourhood of A in the Scott topology. Thus, | | A € O, and 
hence there exists some index ip such that a; E€ O. But O is upwards closed, 
and therefore a; € O whenever ig < i. Thus, A > A, as required. | 
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A.6.2 Proposition Suppose that f : D — E is continuous in the Scott 
topologies on domains D and E. Then whenever x € D, a € De, and a E a, 


we have f(a) E f(x). 


Proof: Let a € De. Since f is continuous at a, given any Scott neighbourhood 
V of f(a), there is a Scott neighbourhood U of a such that f(U) C V. Let 
b € approx(f(a)) be arbitrary. Then V =f b is a Scott neighbourhood of 
f(a). Furthermore, T a is a Scott neighbourhood of a contained in any Scott 
neighbourhood U of a. Therefore, we have f(Ta) C 7b. Thus, if a E x, then 
x E€ Ta. Therefore, f(x) € 7 6, that is, b E f(x). But b € approx(f(a)) is 
arbitrary. Therefore, f(a) E f(a), as required. a 


A.6.3 Proposition Suppose that f : D — E is continuous in the Scott 
topologies on domains D and E. Then f is monotonic. 


Proof: Suppose that x E y in D. Note that if a € approx(z) is arbitrary, 
then a € De and a E a, so that a E y. By Proposition A.6.2, we then have 
f(a) E f(y). Now, approx(«) can be thought of as a net approx(x) = {a; | i € 
T}, as in Proposition A.6.1, and moreover a; — x. Therefore, f(a;) > f(x). 
Hence, by Theorem 3.2.4, for each b € approx(f(a)) there is iọ such that 
b E f(a;) whenever io < i. But a; E x E y, for each i, and so a; E y and 
hence f(a;) E f(y) whenever io < i by our first observation. From this we see 


that b E f(y). Finally, we now have f(x) = | {b | b € approx(f(x))} E f(y) 
so that f(x) E f(y), as required. a 


A.6.4 Proposition A function f : D — E, where D and E are domains, 
is continuous in the Scott topologies on D and F if and only if it is order 
continuous in the sense of Definition 1.1.7. 


Proof: Suppose that f is continuous in the Scott topologies on D and E. 
Then f is monotonic by Proposition A.6.3. Let A C D be a directed set, and 
let A = |_| A. By Proposition A.6.1, A = {a; |i € T} — A as a net, and hence 
f(a;) — f(A) by our hypothesis concerning f. Therefore, by Theorem 3.2.4, 


for each b € approx(f(A)), there exists io such that b E f(a;) whenever ig < i. 
From this we obtain f(A) = f(LJ A) = L|{b | b € approx(f(A))} E LKZ (a) | 
i € T} = [L] f(A). Thus, f(L]A) E || f(A), and it follows that f is order 
continuous by the remarks following Definition 1.1.7. 

Conversely, suppose that f is order continuous and that s; — s in the Scott 
topology on D. Now, f is monotonic. Therefore, on noting that approx(s) is 
directed and thinking of it as the net {a; | j € J}, we have that the set 
{f(aj) | j € J} is directed and f(s) = f(LJapprox(s)) = L] flapprox(s)) = 
LI{f(a;) | j E€ J}. Therefore, given any b € approx(f(s)), there is j € J 
such that b E f(a;), where a; € approx(s). Since s; — s, it follows from 
Theorem 3.2.4 that there is 79 such that a; E s; whenever i9 < i. Hence, 
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by the monotonicity of f, we have that b E f(a;) E f(s;) whenever ig < i. 
Consequently, we have that f(s;) > f(s) in the Scott topology on E, and so 
f is continuous in the Scott topologies, as required. | 


Finally, we consider briefly the separation and compactness properties of 
the Scott topology. 


A.6.5 Proposition When endowed with the Scott topology, any domain 
(D,C) is a compact To topological space, but is not T; in general. 


Proof: Suppose that {U; | i € T} is an open cover of D. Then we have L € Uk, 
where U;, is some element of the given cover and L denotes the bottom element 
of D. But L E x for each x € D and Ux is upwards closed, being Scott open. 
Therefore, D C Ux, and so {Up} is an open subcover of {U; | i € Z}, and 
hence D is compact. 

We show next that D is To. Suppose that x,y € D and xz Æ y. First, 
suppose that x and y are comparable, that is, either x E y or y E x; suppose 
for the sake of argument that x E y and, hence, that x C y since x Æ y. 
We claim that there is a compact element a E y such that either rC aC y 
or x and a are incomparable. If not, then for all compact elements a E y, 
we have that x and a are comparable and indeed a E x. It follows now that 
the supremum of such a is less than or equal to x, which is a contradiction 
since in fact this supremum is y. But then, given the claim, J a is a Scott 
neighbourhood of y which does not contain x. Notice that if a is any compact 
element and a C x, then a E y. So, any Scott neighbourhood of x contains y, 
and we see that the condition in the definition of Tọ is not symmetric in this 
case. 

Now suppose that x and y are incomparable. We claim this time that 
there is a compact element a € approx(x) such that a and y are incomparable. 
Suppose that this is not the case, that is, suppose that for each a € approx(z), 
aand y are comparable. Certainly, it cannot be the case that y E a; otherwise, 
we immediately have y E x. So it must be the case that a E y for each a € 
approx(z). But then we have | |{a | a € approx(x)} E y, that is, x E y, which 
is again a contradiction. Now, given this claim, ta is a Scott neighbourhood 
of x not containing y. Notice that, by symmetry, in this case we also have 
a Scott neighbourhood of y not containing x; thus, the Tı property actually 
applies to some pairs in D (the incomparable pairs), but not to all pairs. In 
any case, we now see that D is To. 

Finally, take the two element domain D = {1,a}, where L C a. The Scott 
topology on D contains 0, TL = D and {a = {a} as its open sets (the set 
{L} is not Scott open). This space D is not Tı since any neighbourhood of L 
contains a. | 
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disjunctive, xix, 125, 130, 135 
locally hierarchical, 141-147, 156, 
174, 179-181 
locally stratified, 44 
normal, 24 
of finite type, 162 
positive, 24 
propositional, 24-27, 187, 192, 217 
satisfies (F) with respect to I and 
l, 41, 150, 151 
satisfies (F12) with respect to J and 
l, 159 
satisfies (F22) with respect to J and 
l, 158 
satisfies (F32) with respect to J and 
1, 156 
satisfies (WF) with respect to I 
and l, 56 
satisfies (WS) with respect to J and 
l, 49 
stratified, 44 
uniquely determined, 139 
weakly stratified, 48 
Logic programming, xix 
Logic, 9 
classical, xx, 9, 43, 165 
four-valued, 10, 165 


269 


Kleene’s strong three-valued, 9, 37, 
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